home *** CD-ROM | disk | FTP | other *** search
Text File | 1990-07-31 | 356.9 KB | 9,312 lines |
-
-
-
-
-
-
-
-
-
-
- *****************************************************************
- * *
- * *
- * VANILLA SNOBOL4 *
- * *
- * TUTORIAL AND REFERENCE MANUAL *
- * *
- * (c) Copyright 1985, 1988 by Catspaw, Inc. *
- * *
- *****************************************************************
-
-
-
- Mark B. Emmer
-
- Catspaw, Inc.
- P.O. Box 1123
- Salida, Colorado 81201 USA
- Telephone: (719) 539-3884
-
-
- Vanilla SNOBOL4 and all accompanying documentation are copy-
- righted materials. However, they may be copied and shared
- provided the following terms are adhered to:
-
- 1. No fee greater than $10 is charged for use, copying or dis-
- tribution.
-
- 2. SNOBOL4.EXE and all documentation are not modified in any
- way, and are distributed together.
-
- 3. The manual may not be packaged with any other product.
-
- 4. Neither SNOBOL4+ (our commercial product), nor its printed
- manual, may be copied.
-
- Vanilla SNOBOL4 was released because we believe many people
- would enjoy programming in SNOBOL4, if there was a version of
- the language that was widely and freely available. Contribu-
- tions are NOT requested. Enjoy and share it!
-
-
-
-
-
- TABLE OF CONTENTS
- -----------------------------------------------------------------
-
-
-
- PART I -- GETTING STARTED
-
- Chapter 1 Getting Started 1
- 1.1 About This Manual.........................1
- 1.2 Installing Vanilla SNOBOL4................1
- 1.3 An Example................................2
-
- Chapter 2 First Program 4
- 2.1 A First Program...........................4
- 2.2 Interactive Statement Execution...........6
-
-
- PART II -- TUTORIAL
-
- Chapter 3 Fundamentals 8
- 3.1 Simple Data Types.........................8
- 3.2 Simple Operators.........................10
- 3.3 Variables................................14
-
- Chapter 4 Control Flow and Functions 17
- 4.1 Success and Failure......................17
- 4.2 A SNOBOL4 Statement......................18
- 4.3 Built-In Functions.......................19
-
- Chapter 5 Input/Output and Keywords 23
- 5.1 Input/Output.............................23
- 5.2 Keywords.................................26
- 5.3 Programs Without Pattern Matching........28
-
- Chapter 6 Pattern Matching 30
- 6.1 Introduction.............................30
- 6.2 Specifying Pattern Matching..............31
- 6.3 Subject String...........................31
- 6.4 Pattern Subsequents and Alternates.......32
- 6.5 Simple Pattern Matches...................33
- 6.6 The Pattern Data Type....................34
- 6.7 Capturing Match Results..................34
- 6.8 Unknowns.................................35
- 6.9 Pattern Matching with Replacement........42
- 6.10 Sample Programs..........................44
- 6.11 Anchored and Unanchored Matching.........48
-
- Chapter 7 Additional Operators and Data Types 49
- 7.1 Indirect Reference.......................49
- 7.2 Unevaluated Expressions..................52
- 7.3 Immediate Assignment.....................53
- 7.4 Arrays...................................55
- 7.5 Tables...................................57
- 7.6 The Name Operator........................61
-
-
-
- - i -
-
-
-
-
-
- Chapter 8 Program-Defined Objects 63
- 8.1 Program-Defined Functions................63
- 8.2 Program-Defined Data Types...............71
- 8.3 Program-Defined Operators................74
-
- Chapter 9 Advanced Topics 77
- 9.1 The ARBNO Function.......................77
- 9.2 Recursive Patterns.......................78
- 9.3 Quickscan and Fullscan...................78
- 9.4 Other Primitive Patterns.................80
- 9.5 Other Functions..........................82
- 9.6 Other Unary Operators....................83
- 9.7 Run-time Compilation.....................83
-
- Chapter 10 Debugging and Program Efficiency 86
- 10.1 Debugging and Tracing....................86
- 10.2 Execution Tracing........................90
- 10.3 Program Efficiency.......................93
-
- Chapter 11 Concluding Remarks 95
-
-
- PART III -- REFERENCE MANUAL
-
- Chapter 12 Introduction 96
- 12.1 Language Background......................96
-
- Chapter 13 Running a SNOBOL4 Program 98
- 13.1 Basic Command Line Format................98
- 13.2 Providing Your Own Parameters............99
- 13.3 Command Line Examples...................100
-
- Chapter 14 Statements 101
- 14.1 Comment Statements......................101
- 14.2 Control Statements......................101
- 14.3 Program Statements......................102
- 14.4 Continuation Statements.................105
- 14.5 Multiple Statements.....................105
- 14.6 The END Statement.......................106
-
- Chapter 15 Operators 107
- 15.1 Unary Operators.........................107
- 15.2 Binary Operators........................108
-
- Chapter 16 Keywords 109
- 16.1 Protected Keywords......................109
- 16.2 Unprotected Keywords....................110
- 16.3 Special Names...........................112
-
- Chapter 17 Data Types and Conversion 113
- 17.1 Data Type Names.........................113
- 17.2 Data Type Conversion....................116
-
-
-
-
-
- - ii -
-
-
-
-
-
- Chapter 18 Patterns and Pattern Functions 121
- 18.1 Primitive Patterns......................121
- 18.2 Primitive Pattern Functions.............122
-
- Chapter 19 Built-In Functions 124
-
- Chapter 20 System Messages 138
- 20.1 Initial Messages........................138
- 20.2 Termination Messages....................138
- 20.3 Compilation Messages....................139
- 20.4 Execution Error Messages................141
- 20.5 Execution Trace Messages................144
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - iii -
-
-
-
-
-
- Chapter 1
-
-
- INSTALLATION
- -----------------------------------------------------------------
-
- Welcome to the world of SNOBOL4! It's a world where you can
- manipulate text and search for patterns in a simple and natural
- manner. SNOBOL4 is a completely general programming language,
- and its magic extends far beyond the world of text processing.
- Concise, powerful programs are easy to write. In addition,
- SNOBOL4's pattern programming provides a new way to work with
- computers. If you would like to add SNOBOL4 to your repertoire
- of problem-solving tools, and learn why so many people are
- excited about it, read on.
-
-
- 1.1 ABOUT THIS MANUAL
-
- This manual is divided into three parts. This part, "Getting
- Started," shows you how to create and run small programs with
- SNOBOL4.
-
- Part II, "Tutorial," is addressed to the beginning SNOBOL4 pro-
- grammer. It assumes a modest knowledge of general programming
- concepts, and experience with another high-level language, such
- as BASIC, C, FORTRAN, or Pascal. Readers without any programming
- background may wish to consult books written with them in mind:
- "A SNOBOL4 Primer" and "SNOBOL Programming for the Humanities,"
- listed in the file SNOBOL4.DOC.
-
- Part III, "Reference," is a complete description of Vanilla
- SNOBOL4. If you are already familiar with the SNOBOL4 language,
- you may wish to skip the tutorial section and proceed directly to
- the reference section for specific details. Later, you can
- return to the tutorial section for fresh insight into the lan-
- guage's use.
-
-
- 1.2 INSTALLING VANILLA SNOBOL4
-
-
- 1.2.1 System Requirements
-
- SNOBOL4 requires the following:
-
- 1. IBM PC, XT, AT, or any other 8086/88/186/286/386 family com-
- puter. Your computer need not be an IBM PC look-alike;
- SNOBOL4 requires MS-DOS compatibility only.
-
- 2. PC- or MS-DOS, Version 2.0 or above.
-
- 3. 105K bytes of free RAM memory.
-
-
-
-
- Getting Started - 1 - Installation
-
-
-
-
-
- 1.2.2 Making a Backup Copy
-
- The Vanilla SNOBOL4 distribution disk should never be used for
- production work. Always make a backup copy, and use it for your
- day-to-day activities:
-
- 1. Use the DOS FORMAT command to initialize a new, blank
- diskette.
-
- 2. If your system has two 5-1/4 inch diskette drives, place the
- SNOBOL4 diskette in drive A, and the new disk in drive B,
- and type:
-
- DISKCOPY A: B:
-
- 3. If you have only one diskette drive, enter:
-
- DISKCOPY A: A:
-
- and follow the instructions for swapping diskettes. The
- Vanilla SNOBOL4 diskette is the Source diskette, while the
- newly formatted diskette is the Target.
-
- If you have a fixed disk, you may create a subdirectory for
- SNOBOL4, and copy all of the SNOBOL4 disk to it.
-
-
- 1.2.3 Initial Checkout
-
- Place your backup disk in the default drive, and play a game of
- Tick-Tack-Toe. Our examples will assume a two-drive system,
- using drive B as the default drive. If you have a one-drive sys-
- tem, or are running SNOBOL4 from the fixed disk, your screen will
- display a different default drive letter (A or C). Enter:
-
- B>SNOBOL4 TICTAC
-
- The SNOBOL4 program should load, and compile the Tick-Tack-Toe
- program. The game will begin execution, and display instruc-
- tions.
-
-
- 1.3 AN EXAMPLE
-
- Just to get a feel for where we're going, let's take a look at
- a small SNOBOL4 program. It produces a sorted list of the words
- in a file, along with a count of how many times each word ap-
- pears. Don't be concerned if you don't understand the program; I
- just want to give you a taste of the language:
-
-
-
-
-
-
-
-
- Getting Started - 2 - Installation
-
-
-
-
-
- * Trim input, set up constants, and create table to
- * hold word counts
- &TRIM = 1
- WRDPAT = BREAK(&LCASE) SPAN(&LCASE "-'") . WORD
- TALLY = TABLE()
-
- * Read a line, convert upper case letters to lower case
- READ LINE = REPLACE(INPUT,&UCASE,&LCASE) :F(CONVERT)
-
- * Get and remove next word from LINE, place in variable WORD
- NEXTWRD LINE WRDPAT = :F(READ)
-
- * Increment the count for this word
- TALLY[WORD] = TALLY[WORD] + 1 :(NEXTWRD)
-
- * Convert the table to an array
- CONVERT RESULT = CONVERT(TALLY, "ARRAY") :F(NONE)
-
- * Display the results
- OUTPUT = "Word Counts"
- I = 1
- PRINT OUTPUT = RESULT[I,1] " - " RESULT[I,2] :F(END)
- I = I + 1 :(PRINT)
- NONE OUTPUT = "There aren't any words!"
- END
-
- Running the program with the sample text on the disk as input
- would produce a usage count like this:
-
- Word Counts
- hark - 2
- the - 1
- lark - 1
- at - 2
- heaven's - 1
- . . .
-
- Notice some of the things that seem to occur so effortlessly
- here: A word is defined to be any combination of lower case let-
- ters, hyphen, and apostrophe. Data from the file are converted
- to lower case. A table of word counts uses the words themselves
- as subscripts. The table is converted to an array in one state-
- ment, and printed without any knowledge of the array's size.
- Finally, because the definition of a word is contained in one
- succinct pattern, it's easy to modify the program to catalog
- other kinds of text patterns.
-
- Excluding comments and the END statement, there are 12 working
- statements in this program---and this program uses only a frac-
- tion of SNOBOL4's power. How much work would it be to write such
- a program in any other language you are familiar with? Is it
- possible that there is something unique about SNOBOL4?
-
- Let's go on now to write a simple first program.
-
-
-
- Getting Started - 3 - Installation
-
-
-
-
-
- Chapter 2
-
-
- FIRST PROGRAM
- -----------------------------------------------------------------
-
-
- 2.1 A FIRST PROGRAM
-
- For the following exercises, you should have SNOBOL4 available
- on your default disk drive, or in your default directory if a
- fixed disk is used. This manual assumes that drive B is your
- default disk drive, and will show the DOS prompt as "B>". Users
- with other hardware configurations may see "A>" or "C>".
-
- We will begin with a very simple program, one that prints a
- greeting on your computer's display screen. It will familiarize
- you with the mechanics of running a SNOBOL4 program. Every line
- you enter from the keyboard (or "console") should end by pressing
- the ENTER key (marked ─┘).
-
- You start the system by typing SNOBOL4 CON at the DOS command
- prompt B>. SNOBOL4 displays two title lines and prompts you to
- enter your program with a question mark on each line:
-
- B>SNOBOL4 CON
-
- Vanilla SNOBOL4 Version 2.14.
- (c) Copyright 1984,1988 Catspaw, Inc. All Rights Reserved.
- Enter program, terminate with "END"
- ?
-
- Now enter the program. Use the tab character to begin the
- indented line, and be sure to place blanks on each side of the
- equal sign:
-
- ? OUTPUT = 'Hello world!'
- ?END
-
- No errors
-
- Hello world!
-
- B>
-
- As you enter each line, it is compiled into a compact internal
- notation. The first program line begins with a tab; the second
- is flush left. The word END is special; it signals SNOBOL4 that
- you have finished entering program lines. It must appear at the
- left margin to be recognized. After the END statement is
- entered, SNOBOL4 begins to run your program.
-
- This program consists of one "assignment statement."
- Assignment takes the value on the right side of the equals sign,
-
-
-
- Getting Started - 4 - First Program
-
-
-
-
-
- and stores it in the "variable" on the left. The value on the
- right is the character string literal 'Hello world!'. The
- variable's name is OUTPUT, which is a special name in SNOBOL4;
- values assigned to it are displayed on the screen. After the
- assignment statement is performed, control flows into the END
- statement and the program stops.
-
- SNOBOL4 only provides DOS in-line editing as you enter your
- program. It is not a program editor, and does not save your pro-
- gram or let you correct mistakes in previous program lines. Usu-
- ally, you'll want to prepare your program in a disk file.
-
- Try creating a program file in DOS. The symbol ^Z represents
- the DOS End-of-File character, which terminates the DOS COPY com-
- mand. It is created by entering control-Z or pressing function
- key 6.
-
- B>COPY CON HELLO.SNO
- OUTPUT = 'Hello world!'
- END
- ^Z
- 1 File(s) copied
- B>
-
- Now you can have SNOBOL4 read and execute your program from
- file HELLO.SNO:
-
- B>SNOBOL4 HELLO.SNO
-
- Vanilla SNOBOL4 Version 2.14.
- (c) Copyright 1984,1988 Catspaw, Inc. All Rights Reserved.
-
- No errors
-
- Hello world!
-
- B>
-
- Of course, the program file could also have been created with
- your program text editor. If you are using a word processor,
- remember to produce an unadulterated ASCII file, free of any spe-
- cial format controls.
-
- SNOBOL4 assigns a unique number to each program statement. The
- statement number and line number are displayed whenever an error
- message is produced. To get a listing of your program with
- SNOBOL4's statement numbers, try:
-
-
-
-
-
-
-
-
-
-
- Getting Started - 5 - First Program
-
-
-
-
-
- B>SNOBOL4 HELLO /L=CON
-
- Vanilla SNOBOL4 Version 2.14.
- (c) Copyright 1984,1988 Catspaw, Inc. All Rights Reserved.
- 1 OUTPUT = 'Hello world!'
- 2 END
-
- No errors
-
- Hello world!
-
- B>
-
- The first line, on which you typed SNOBOL4, is called the com-
- mand line. It may contain options that alter SNOBOL4's behavior.
- The option /L= tells SNOBOL4 to send a listing of your source
- file to the specified file or device. Another device, such as
- PRN:, would print a listing on your printer. Other command line
- options are discussed in Chapter 13, "Running a SNOBOL4 Program."
-
- In this example we omitted the file name extension. SNOBOL4
- will supply the .SNO extension for the source file if it is
- absent.
-
- You've now run a simple SNOBOL4 program in two ways: by typing
- it in directly, and by creating a disk file.
-
-
- 2.2 INTERACTIVE STATEMENT EXECUTION
-
- It's very helpful to "try out" simple statements as they are
- introduced in the text. There is a SNOBOL4 program called
- CODE.SNO on the distribution diskette to help you do this. Try
- it now with a few simple statements. Type END or control-Z to
- stop the program.
-
- B>SNOBOL4 CODE
-
- Vanilla SNOBOL4 Version 2.14.
- (c) Copyright 1984,1988 Catspaw, Inc. All Rights Reserved.
-
- No errors
-
- Enter SNOBOL4 statements:
- ? OUTPUT = 'HELLO AGAIN!'
- HELLO AGAIN!
- Success
- ? OUTPUT = 16
- 16
- Success
- ?END
-
- B>
-
-
-
-
- Getting Started - 6 - First Program
-
-
-
-
-
- Feel free to experiment---you can't break anything by using
- this program. At most, you will get a SNOBOL4 error, and return
- to the DOS command prompt. In that case, just start SNOBOL4 and
- CODE.SNO over again.
-
- Whenever you see examples in the text that begin with a ques-
- tion mark, they are meant to be tried with CODE.SNO. In the text
- I'll omit the word Success most of the time unless it is relevant
- to the concept being presented, although it will still appear on
- your display. I'll also try to restrict the examples to upper
- case, so you can set the CAPS LOCK mode on your computer, and
- type without using the shift key.
-
- Let's now proceed to the tutorial.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Getting Started - 7 - First Program
-
-
-
-
-
- Chapter 3
-
-
- FUNDAMENTALS
- -----------------------------------------------------------------
-
- SNOBOL4 is really a combination of two kinds of languages: a
- conventional language, with several data types and a simple but
- powerful control structure, and a pattern language, with a struc-
- ture all its own. The conventional language is not block struc-
- tured, and may appear old-fashioned. The pattern language,
- however, remains unsurpassed, and is unique to SNOBOL4.
-
- You should try to master the conventional portion of SNOBOL4
- first. When you're comfortable with it, you can move on to pat-
- tern matching. Pattern matching by itself is a very large sub-
- ject, and this manual can only offer an introduction. The sample
- programs accompanying Vanilla SNOBOL4, as well as the many
- SNOBOL4 books available from Catspaw can be studied for a deeper
- understanding of patterns and their application.
-
- We'll begin by discussing data types, operators, and variables.
-
-
- 3.1 SIMPLE DATA TYPES
-
- SNOBOL4 has several different basic types, but has a mechanism
- to define hundreds more as aggregates of others. Initially,
- we'll discuss the two most basic: integers and strings.
-
-
- 3.1.1 Integers
-
- An integer is a simple whole number, without a fractional part.
- In SNOBOL4, its value can range from -32767 to +32767. It ap-
- pears without quotation marks, and commas should not be used to
- group digits. Here are some acceptable integers:
-
- 14 -234 0 0012 +12832 -9395 +0
-
- These are incorrect in SNOBOL4:
-
- 13.4 fractional part is not allowed
-
- 49723 larger than 32767
-
- - number must contain at least one digit
-
- 3,076 comma is not allowed
-
- Use the CODE.SNO program to test different integer values. Try
- both legal and illegal values. Here are some sample test lines:
-
-
-
-
-
- Tutorial - 8 - Fundamentals
-
-
-
-
-
- Enter SNOBOL4 statements:
- ? OUTPUT = 42
- 42
- ? OUTPUT = -825
- -825
- ? OUTPUT = 73768
- Compilation error: Erroneous integer, re-enter:
-
-
- 3.1.2 Reals
-
- Vanilla SNOBOL4 does not include real numbers. They are
- available in SNOBOL4+, Catspaw's highly enhanced implementation
- of the SNOBOL4 programming language.
-
-
- 3.1.3 Strings
-
- A string is an ordered sequence of characters. The order of
- the characters is important: the strings AB and BA are different.
- Characters are not restricted to printing characters; all of the
- 256 combinations possible in an 8-bit byte are allowed.
-
- Normally, the maximum length of a string is 5,000 characters,
- although you can tell SNOBOL4 to accept longer strings. A string
- of length zero (no characters) is called the null string. At
- first, you may find the idea of an empty string disturbing: it's
- a string, but it has no characters. Its role in SNOBOL4 is simi-
- lar to the role of zero in the natural number system.
-
- Strings may appear literally in your program, or may be created
- during execution. To place a literal string in your program, en-
- close it in apostrophes (')1 or double quotation marks (").
- Either may be used, but the beginning and ending marks must be
- the same. The string itself may contain one type of mark if the
- other is used to enclose the string. The null string is repre-
- sented by two successive marks, with no intervening characters.
- Here are some samples to try with CODE.SNO:
-
-
-
-
-
-
-
-
-
-
- ____________________
-
- 1 Apostrophe (single quote) should not be confused with the
- grave accent mark (`) which appears next to it on some computer
- keyboards. The grave accent may not be used as a string
- delimiter.
-
-
-
- Tutorial - 9 - Fundamentals
-
-
-
-
-
- ? OUTPUT = 'STRING LITERAL'
- STRING LITERAL
- ? OUTPUT = "So is this"
- So is this
- ? OUTPUT = ''
-
- ? OUTPUT = 'WHO COINED THE WORD "BYTE"?'
- WHO COINED THE WORD "BYTE"?
- ? OUTPUT = "WON'T"
- WON'T
-
-
- 3.2 SIMPLE OPERATORS
-
- If data is the raw material, operators are the tools that do
- the work. Some operators, such as + and -, appear in all pro-
- gramming languages, and pocket calculators. But SNOBOL4 provides
- many more, some of which are unique to the SNOBOL4 language.
- SNOBOL4 also allows you to define your own operators. We'll
- examine just a few basic operators below.
-
-
- 3.2.1 Unary vs. Binary
-
- SNOBOL4 operators require either one or two items of data,
- called operands. For example, the minus sign (-) can be used
- with one object. In this form, the operator is considered unary:
-
- -6
-
- or as a binary operator with two operands:
-
- 4 - 1
-
- In the first case, the minus sign negates the number. The sec-
- ond example subtracts 1 from 4. The minus sign's meaning depends
- on the context in which it appears. SNOBOL4 has a very simple
- rule for determining if an operator is binary or unary:
-
- Unary operators are placed immediately to the left of
- their operand. No blank or tab character may appear
- between operator and operand.
-
- Binary operators have one or more blank or tab charac-
- ters on each side.
-
- The blank or tab requirement for binary operators causes prob-
- lems for programmers first learning SNOBOL4. Most other lan-
- guages make these white space characters optional. Omitting the
- right hand blank after a binary operator will produce a unary
- operator, and while the statement may be syntactically correct,
- it will probably produce unexpected results. Fortunately, blanks
- and binary operators quickly become a way of SNOBOL4 life, and
- after some initial forgetfulness there are few problems.
-
-
-
- Tutorial - 10 - Fundamentals
-
-
-
-
-
- 3.2.2 Some Binary Operators
-
-
- Operation: Assignment
- Symbol: = (equals sign)
-
- You've already met one binary operator, the equals sign (=).
- It appeared in the first sample program:
-
- OUTPUT = 'Hello world!'
-
- It assigns, or transfers, the value of the object on the right
- ('Hello world!') to the object on the left (variable OUTPUT).
-
-
- Operation: Arithmetic
- Symbols: **, *, /, +, -
-
- These characters provide the arithmetic operations---exponenti-
- ation, multiplication, division, addition, and subtraction
- respectively. Each is assigned a priority, so SNOBOL4 knows
- which to perform first if more than one appear in an expression.
- Exponentiation is performed first, followed by multiplication,
- division, and finally addition and subtraction. SNOBOL4 is
- unusual in giving multiplication higher priority than division;
- most programming languages treat them equally.
-
- You may use parentheses to change the order of operations.
- Division of an integer by another integer will produce a trun-
- cated integer result; the fractional result is discarded. Try
- the following:
-
- ? OUTPUT = 3 - 6 + 2
- -1
- ? OUTPUT = 2 * (10 + 4)
- 28
- ? OUTPUT = 7 / 4
- 1
- ? OUTPUT = 3 ** 5
- 243
- ? OUTPUT = 10 / 2 * 5
- 1
- ? OUTPUT = (10 / 2) * 5
- 25
-
- When the same operator occurs more than once in an expression,
- which one should be performed first? The governing principle is
- called associativity, and is either left or right. Multiple
- instances of *, /, + and - are performed left to right, while
- **'s are performed right to left. Again, parentheses may be used
- to change the default order. Try a few examples:
-
-
-
-
-
-
- Tutorial - 11 - Fundamentals
-
-
-
-
-
- ? OUTPUT = 24 / 4 / 2
- 3
- ? OUTPUT = 24 / (4 / 2)
- 12
- ? OUTPUT = 2 ** 2 ** 3
- 256
- ? OUTPUT = (2 ** 2) ** 3
- 64
-
- Here's the first bit of SNOBOL4 magic: what happens if either
- operand is a string rather than an integer or real number? The
- action taken is one which is widespread throughout the SNOBOL4
- language; the system tries to convert the operand to a suitable
- data type. Given the statement
-
- ? OUTPUT = 14 + '54'
- 68
-
- SNOBOL4 detects the addition of an integer and a string, and
- tries to convert the string to a numeric value. Here the conver-
- sion succeeds, and the integers 14 and 54 are added together. If
- the characters in the string do not form an acceptable integer,
- SNOBOL4 produces the error message "Illegal data type."
-
- SNOBOL4 is strict about the composition of strings being con-
- verted to numeric values: leading or trailing blanks or tabs are
- not allowed. The null string is permitted, and converted to
- integer 0. Try producing some arithmetic errors:
-
- ? OUTPUT = 14 + ' 54'
- Execution error #1, Illegal data type
- Failure
- ? OUTPUT = 'A' + 1
- Execution error #1, Illegal data type
- Failure
-
- Note: Error numbers are listed in Chapter 20, "System Messages."
-
-
- Operation: Concatenation
- Symbols: blank or tab
-
- This is the fundamental operator for assembling strings. Two
- strings are concatenated simply by writing one after the other,
- with one or more blank or tab characters between them. There is
- no explicit symbol for concatenation (it is special in this
- regard), the white space between two objects serves to define
- this operator. The blank or tab character merely specifies the
- operation; it is not included in the resulting string.
-
- The string that results from concatenation is the right string
- appended to the end of the left. The two strings remain
- unchanged and a third string emerges as the result. Try a few
- simple concatenations with CODE.SNO:
-
-
-
- Tutorial - 12 - Fundamentals
-
-
-
-
-
- ? OUTPUT = 'CONCAT' 'ENATION'
- CONCATENATION
- ? OUTPUT = 'ONE,' 'TWO,' 'THREE'
- ONE,TWO,THREE
- ? OUTPUT = 'A' 'B' 'C'
- ABC
- ? OUTPUT = 'BEGINNING ' 'AND ' 'END.'
- BEGINNING AND END.
-
- The string resulting from concatenation can not be longer than
- the maximum allowable string size.
-
- The concatenation operator works only on character strings, but
- if an operand is not a string, SNOBOL4 will convert it to its
- string form. For example,
-
- ? OUTPUT = (20 - 17) ' DOG NIGHT'
- 3 DOG NIGHT
- ? OUTPUT = 19 (12 / 3)
- 194
-
- In the first case, concatenation's right operand is the string
- ' DOG NIGHT', but the left operand is an integer expression
- (20 - 17). SNOBOL4 performs the subtraction, converts the result
- to the string '3', and produces the final result '3 DOG NIGHT'.
- In the second example, the integer operands are converted to the
- strings '19' and '4', to produce the result string '194'. This
- is not exactly good math, but it is correct concatenation.
-
- You must be careful however. If you accidentally omit an
- operator, SNOBOL4 will think you intended to perform concatena-
- tion. In the example above, perhaps we omitted a minus sign and
- had really meant to say:
-
- ? OUTPUT = 19 - (12 / 3)
- 15
-
- It is always possible for concatenation to automatically con-
- vert a number to a string. But there is one important exception
- when SNOBOL4 doesn't try to do this: if either operand is the
- null string, the other operand is returned unchanged. It is not
- coerced into the string data type. If the first example were
- changed to:
-
- ? OUTPUT = (20 - 17) ''
- 3
-
- the result is the INTEGER 3. You'll find you'll use this aspect
- of null string concatenations extensively in your SNOBOL4 pro-
- gramming.
-
- Before we proceed, let's think about the null string one more
- time as the string equivalent of the number zero. First of all,
- adding zero to a number does not change its value, and concatena-
-
-
-
- Tutorial - 13 - Fundamentals
-
-
-
-
-
- ting the null string with an object doesn't change it, either.
- Second, just as a calculator is cleared to zero before adding a
- series of numbers, the null string can serve as the starting
- place for concatenating a series of strings.
-
-
- 3.2.3 Some Unary Operators
-
- There aren't many interesting unary operators at this point in
- your tour of SNOBOL4. Most of them appear in connection with
- pattern matching, discussed later. Note, however, that all unary
- operations are performed before binary operations, unless prece-
- dence is altered by parentheses.
-
-
- Operation: Arithmetic
- Symbols: +, -
-
- These unary operators require a single numeric operand, which
- must immediately follow the operator, without an intervening
- blank or tab. Unary minus (-) changes the arithmetic sign of its
- operand; unary plus (+) leaves the sign unchanged. If the
- operand is a string, SNOBOL4 will try to convert it to a number.
- The null string is converted to integer 0. Coercing a string to
- a number with unary plus is a noteworthy technique. Try unary
- plus and minus with CODE.SNO:
-
- ? OUTPUT = -(3 * 5)
- -15
- ? OUTPUT = +''
- 0
-
-
- 3.3 VARIABLES
-
- A variable is a place to store an item of data. The number of
- variables you may have is unlimited, provided you give each one a
- unique name. Think of a variable as a box, marked on the outside
- with a permanent name, able to hold any data value or type. Many
- programming languages require that you formally declare what kind
- of entity the box will contain---integer, real, string, etc.---
- but SNOBOL4 is more flexible. A variable's contents may change
- repeatedly during program execution. The size of the box con-
- tracts or expands as necessary. One moment it might contain an
- integer, then a 2,000 character string, then the null string; in
- fact, any SNOBOL4 data type.
-
- There are only a few rules about composing a variable's name
- when it appears in your program:
-
- 1. The name must begin with an upper- or lower-case letter.
-
- 2. If it is more than one character long, the remaining charac-
- ters may be any combination of letters, numbers, or the
-
-
-
- Tutorial - 14 - Fundamentals
-
-
-
-
-
- characters period (.) and underscore (_).
-
- 3. The name may not be longer than the maximum line length (120
- characters).
-
- Here are some correct SNOBOL4 names:
-
- WAGER P23 VerbClause SUM.OF.SQUARES Buffer
-
- Normally, SNOBOL4 performs "case-folding" on names. Lower-case
- alphabetic characters are changed to upper-case when they appear
- in names---Buffer and BUFFER are equivalent. Naturally, case-
- folding of data does not occur within a string literal. Case-
- folding can be disabled by the command line option /C.
-
- In some languages, the initial value of a new variable is
- undefined. SNOBOL4 guarantees that a new variable's initial
- value is the null string. However, except in very small pro-
- grams, you should always initialize variables. This prevents
- unexpected results when a program is modified or a program seg-
- ment is reexecuted.
-
- You store something in a variable by making it the object of an
- assignment operation. You can retrieve its contents simply by
- using it wherever its value is needed. Using a variable's value
- is nondestructive; the value in the box remains unchanged. Try
- creating some variables using CODE.SNO:
-
- ? ABC = 'EGG'
- ? OUTPUT = ABC
- EGG
- ? D = 'SHELL'
- ? OUTPUT = abc d (Same as ABC D)
- EGGSHELL
- ? OUTPUT = NONESUCH (New variable is null)
-
- ? OUTPUT = ABC NULL D
- EGGSHELL
- ? N1 = 43
- ? D = 17
- ? OUTPUT = N1 + D
- 60
- ? output = ABC D
- EGG17
-
- OUTPUT is a variable with special properties; when a value is
- stored in its box, it is also displayed on your screen. There is
- a corresponding variable named INPUT, which reads data from your
- keyboard. Its box has no permanent contents. Whenever SNOBOL4
- is asked to fetch its value, a complete line is read from the
- keyboard and used instead. If INPUT were used twice in one
- statement, two separate lines of input would be read. Try these
- examples:
-
-
-
-
- Tutorial - 15 - Fundamentals
-
-
-
-
-
- ? OUTPUT = INPUT
- TYPE ANYTHING YOU DESIRE
- TYPE ANYTHING YOU DESIRE
- ? TWO.LINES = INPUT '-AND-' INPUT
- FIRST LINE
- SECOND LINE
- ? OUTPUT = TWO.LINES
- FIRST LINE-AND-SECOND LINE
-
- SNOBOL4 variables are global in scope---any variable may be
- referenced anywhere in the program.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Tutorial - 16 - Fundamentals
-
-
-
-
-
- Chapter 4
-
-
- CONTROL FLOW AND FUNCTIONS
- -----------------------------------------------------------------
-
-
- 4.1 SUCCESS AND FAILURE
-
- Success and failure are as important in SNOBOL4 as they are in
- life. Success and failure are unmistakable signals; something
- either worked, or it didn't. Significant program conciseness is
- achieved by recognizing that data values and signals are funda-
- mentally different entities.
-
- The elements of a statement provide values and signals as com-
- putation proceeds. SNOBOL4 accumulates both, and stops executing
- a particular statement when it finds it cannot succeed. Program
- flow can be altered based upon this success or failure.
-
- The success signal will have a value result associated with it.
- In situations in which the signal itself is the desired object,
- the result value may only be the null string. The failure signal
- has no associated value. (In some instances, it may be helpful
- to view failure as meaning "failure to produce a result.")
-
- Previously, we introduced the variable INPUT, which reads a
- line from the keyboard. In general, INPUT can be made to read
- from any disk file. The line read may be any character string,
- including the null string if it is an empty line. If any string
- might appear, then there is no special value we can test for to
- detect End-of-File. Success and failure provide an elegant
- alternative to testing for special values.
-
- When we retrieve a value from INPUT, we normally get a string
- and a success signal. But when End-of-File is encountered, we
- get a failure signal instead, and no value.
-
- Since control-Z (or function key 6) allows you to enter an End-
- of-File from the keyboard, we can easily demonstrate this type of
- failure. As you've noticed, the CODE.SNO program reports the
- success or failure of each statement. So far, all examples have
- succeeded. Now try this one:
-
- ? OUTPUT = INPUT
- ^Z
- Failure
-
- Success and failure are control signals, and appear only during
- the execution of a statement. They cannot be stored in a vari-
- able, which holds values only.
-
- There is much more which can be done with success and failure,
- but to understand their use, you'll need to know how SNOBOL4
-
-
-
- Tutorial - 17 - Control Flow and Functions
-
-
-
-
-
- statements are constructed.
-
-
- 4.2 A SNOBOL4 STATEMENT
-
- In general, a SNOBOL4 statement looks like this:
-
- Label Statement body :GOTO
-
- The label is optional, and is omitted by placing a blank or tab
- in the first character position. The GOTO is also optional, and
- can be eliminated simply by omitting it and the colon. In fact,
- even the statement body is optional. You can have a program line
- consisting of just a label or a GOTO field.
-
-
- 4.2.1 The Label Field
-
- SNOBOL4 normally executes the statements of a program in
- sequence. The ability to transfer control from one statement to
- another, perhaps conditionally, makes SNOBOL4 much more usable.
-
- Labels provide names for statements. If present, they must
- begin in the first character position of a statement, and must
- start with a letter or number. Additional characters may be any-
- thing but blank or tab. Like variable names, lower-case letters
- are equivalent to upper-case when case-folding (the default).
-
-
- 4.2.1 The GOTO Field
-
- Transfer of control is made possible by the GOTO. It inter-
- rupts the normal sequential execution of statements by telling
- SNOBOL4 which statement to execute after the present one. The
- GOTO field appears at the end of the statement, preceded by a
- colon (:), and has one of these forms:
-
- :(label)
- :S(label)
- :F(label)
- :S(label1) F(label2)
-
- White space is required before the colon. "Label" is the name
- given the target statement, and must be enclosed in parentheses.
- If the first form is used, execution resumes at the referenced
- statement, unconditionally. In the second and third forms,
- transfer occurs only if the statement has succeeded or failed,
- respectively. Otherwise, execution proceeds to the next state-
- ment in line. If the fourth form is used, transfer is made to
- label1 if the statement succeeded, or to label2 if it failed. A
- statement with a label and a GOTO would look like this:
-
- COPY OUTPUT = INPUT :F(DONE)
-
-
-
-
- Tutorial - 18 - Control Flow and Functions
-
-
-
-
-
- Now let's write a short program which copies keyboard input to
- the screen, and reports the total number of lines. If you are an
- accurate typist, you can type it into SNOBOL4 directly. Other-
- wise, you should use your text editor to create a file containing
- the program text. First stop the CODE.SNO program by typing END:
-
- ?END
-
- B>SNOBOL4 CON
-
- Vanilla SNOBOL4 Version 2.14.
- (c) Copyright 1984,1988 Catspaw, Inc. All Rights Reserved.
- Enter program, terminate with "END"
- ? N = 0
- ?COPY OUTPUT = INPUT :F(DONE)
- ? N = N + 1 :(COPY)
- ?DONE OUTPUT = 'THERE WERE ' N ' LINES'
- ?END
-
- No errors
-
- TYPE IN A TEST LINE
- TYPE IN A TEST LINE
-
- AND ANOTHER
- AND ANOTHER
-
- ^Z
- THERE WERE 2 LINES
-
- B>
-
- We start the line count in variable N at 0. The next statement
- has a label, COPY, a statement body, and a GOTO field. It is an
- assignment statement, and begins execution by reading a line of
- input. If INPUT successfully obtains a line, the result is
- stored in OUTPUT. The GOTO field is only testing for failure, so
- SNOBOL4 proceeds to the next statement, where N is incremented,
- and the unconditional GOTO transfers back to statement COPY.
-
- When an End-of-File is read, variable INPUT signals failure.
- Execution of this statement terminates immediately, without per-
- forming the assignment, and transfers to the statement labeled
- DONE. The number of lines is displayed, and control flows into
- the END statement, stopping the program.
-
-
- 4.3 BUILT-IN FUNCTIONS
-
- A function is analogous to an operator; it operates on data to
- produce a result. The data objects are called the arguments of
- the function. The result returned---the function of the argu-
- ments---may have two components: the success or failure signal;
- and for success, a value. The value may be any data type.
-
-
-
- Tutorial - 19 - Control Flow and Functions
-
-
-
-
-
- A function is used by writing its name and a list of arguments
- enclosed by parentheses:
-
- FUNCTION_NAME(ARG1, ARG2, ..., ARGn)
-
- It may appear in your program anywhere a constant is allowed---
- in expressions, patterns, even as the argument of another func-
- tion. If the function has more than one argument, they should be
- separated by commas. If trailing arguments are omitted, SNOBOL4
- will supply the null string instead. Some functions, such as one
- that returns the current date, have no arguments at all.
-
- SNOBOL4 provides a large number of predefined functions, and
- allows you to define your own. The large repertoire of built-in
- functions makes SNOBOL4 programming easier. Most functions are
- concerned with pattern matching, input/output, and advanced fea-
- tures of the language. Here we'll introduce a few simple
- conditional, numeric, and string functions to give you an idea of
- the variety. Try them interactively with CODE.SNO.
-
-
- 4.3.1 Conditional Functions
-
- These functions fail or succeed depending upon their arguments.
- They are sometimes called predicate functions because the success
- of an expression using them is predicated upon their success. If
- they succeed, they return the null string as their value.
-
- Function Succeeds if:
-
- IDENT(S,T) S and T are identical. S and T may be con-
- stants or variables with any data type. To
- be identical, the arguments must have the
- same data type and value. Since omitted ar-
- guments default to the null string, IDENT(S)
- succeeds if S is the null string.
-
- DIFFER(S,T) S and T are different. DIFFER is the oppo-
- site of IDENT. DIFFER(S) succeeds if S is
- not the null string.
-
- EQ(X,Y) Integers X and Y are equal. X and Y must be
- integers, or strings which can be converted
- to integers.
-
- NE(X,Y) Integers X and Y are not equal.
-
- GE(X,Y) Integer X is greater than or equal to Y.
-
- GT(X,Y) Integer X is greater than Y.
-
- LE(X,Y) Integer X is less than or equal to Y.
-
- LT(X,Y) Integer X is less than Y.
-
-
-
- Tutorial - 20 - Control Flow and Functions
-
-
-
-
-
- INTEGER(X) X is an integer, or a string which can be
- converted to an integer.
-
- LGT(S,T) String S is lexically greater than string T
- using a character-by-character comparison.
-
- Leading blanks may be used in front of a argument for readabil-
- ity. Here are some exercises for CODE.SNO:
-
- ? N = 3
- ? EQ(N, 3)
- Success
- ? IDENT(N, 3)
- Success
- ? EQ(3, "3")
- Success
- ?IDENT(3, "3") (integer and string)
- Failure
- ? EQ(N, 4)
- Failure
- ? NE(N, 4)
- Success
- ? INTEGER(N)
- Success
- ? INTEGER('47')
- Success
- ? DIFFER('ABC', 'abc')
- Success
- ? IDENT('a' 'b' 'c', 'abc')
- Success
- ? LGT('ABC', 'ABD')
- Failure
-
- When any of these functions succeed, they return a null string.
- Since other statement elements are not altered when concatenated
- with the null string, this provides an easy way to interpose
- tests and construct loops. Suppose we execute the statement:
-
- N = LT(N,10) N + 1 :S(LOOP)
-
- Function LT fails if N is 10 or greater. If the statement
- fails, the assignment is not performed, and execution continues
- with the next statement. However, if LT succeeds, its null
- string value is concatenated with the expression N + 1, and the
- result is assigned to N. This has the effect of increasing N by
- 1 and transferring to statement LOOP until N reaches 10.
-
- If we concatenated several conditional functions together, and
- they all succeeded, the result would still be the null string.
- If any function failed, the entire concatenation would fail.
- This gives us a simple way to produce a successful result if a
- number of conditions are all true. For example, the expression:
-
- INTEGER(N) GE(N,5) LE(N,100)
-
-
-
- Tutorial - 21 - Control Flow and Functions
-
-
-
-
-
- succeeds if N is an integer between 5 and 100.
-
-
- 4.3.2 Other Functions
-
- These functions always succeed; all but REMDR and SIZE return a
- string result.
-
- DATE() Return current date and time as a string.
-
- DUPL(S,N) Duplicate string S, N times.
-
- REMDR(X,Y) Produce the remainder (modulus) of X / Y.
-
- REPLACE(S1,S2,S3) Return string S1 after performing the
- character replacements specified by strings
- S2 and S3. S2 specifies which characters to
- replace, and S3 specifies what to replace
- them with.
-
- SIZE(S) Return the number of characters in string S.
-
- TRIM(S) Return string S with trailing blanks removed.
-
- Exercises for CODE.SNO:
-
- ? OUTPUT = 'THE DATE AND TIME ARE: ' DATE()
- THE DATE AND TIME ARE: 10-19-87 11:49:33.90
- ? OUTPUT = DUPL('ABC', 20)
- ABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABC
- ? OUTPUT = SIZE('ZIPPY')
- 5
- ? OUTPUT = SIZE('')
- 0
- ? OUTPUT = TRIM('TRAILING BLANKS ') 'GONE'
- TRAILING BLANKSGONE
- ? OUTPUT = REPLACE('spoon','po','PO')
- sPOOn
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Tutorial - 22 - Control Flow and Functions
-
-
-
-
-
- Chapter 5
-
-
- INPUT/OUTPUT AND KEYWORDS
- -----------------------------------------------------------------
-
-
- 5.1 INPUT/OUTPUT
-
- We've already performed simple input and output with variables
- INPUT and OUTPUT. In this chapter, you'll learn more about
- SNOBOL4's I/O capabilities.
-
- SNOBOL4 can communicate with up to 16 different files at once.
- A "file" is either a disk file or a device, such as a printer.
- Every file is identified by a "unit number," which is an integer
- between 1 and 16. You chose the numbers when you select the
- files you wish to use. The particular numbers chosen have no
- special significance; they just distinguish one file from
- another.
-
- Actual input or output of data is performed by "associating" a
- variable with a unit number and a direction. When a statement
- tries to use the variable's value, a line is read from the asso-
- ciated file. When a value is stored in the variable, a line is
- written to the associated file. INPUT and OUTPUT are variables
- whose association with the keyboard and screen were preset by
- SNOBOL4. For historical reasons, they use unit numbers 5 and 6
- respectively.
-
- Strings are the only data type which can be transferred to and
- from files. A successful input operation always returns a
- string. During output, nonstring objects, such as integers, are
- automatically converted to their string form.
-
- The functions INPUT and OUTPUT (not to be confused with the
- variables INPUT and OUTPUT) are provided to attach a unit number
- to a variable, and optionally, to a particular file. Their names
- are distinguished from the variables of the same names by appear-
- ing as functions, that is, with parentheses and an argument list.
-
-
- 5.1.1 Associating File Names and Units
-
- There are two ways to tell SNOBOL4 what file will be used with
- a particular unit number:
-
- 1. As an option on the SNOBOL4 command line, like this:
-
- B>SNOBOL4 PROGRAM /2=ADDRESS.TXT /8=RESULT.DAT
-
- Here, unit number 2 is associated with the file named
- 'ADDRESS.TXT', and unit number 8 with file 'RESULT.DAT'. It
- will still be necessary to use the INPUT or OUTPUT function
-
-
-
- Tutorial - 23 - Input/Output and Keywords
-
-
-
-
-
- described below to associate variables with these unit num-
- bers. This method works best when different files will be
- used each time the program is run.
-
- 2. Use a string containing the file name as the fourth argument
- to the INPUT or OUTPUT function, as in:
-
- INPUT(..., 2, ..., 'ADDRESS.TXT')
-
- This method is better when the file name will not change, or
- is a string derived from a dialogue with the user, or is
- produced from a string calculation.
-
- A file name consisting of a single hyphen ("-") is reserved,
- and specifies the MS-DOS standard input file when used with
- the INPUT function, or the standard output file when used
- with the OUTPUT function. These standard input or output
- files may be redirected on the command line using the MS-DOS
- redirection operators ("<filename" and ">filename").
-
-
- 5.1.2 Input
-
- This function associates a variable with data read from a file:
-
- INPUT('variable', unit, length, 'file')
-
- It succeeds and returns the null string if the file was found
- and successfully opened, and fails otherwise. Length is an
- optional integer that specifies the line length. If the file
- name argument is omitted, SNOBOL4 consults the command line to
- find the file to use with this unit.
-
- For example, to open file TEXT.IN for input as unit 1, and as-
- sociate variable READLINE with it, we would say
-
- INPUT('READLINE', 1, , 'TEXT.IN') :S(OK)
- OUTPUT = 'Could not find file' :(END)
- OK . . .
-
- If the file name were specified on the command line as
- /1=TEXT.IN, we only need the first two arguments to INPUT:
-
- INPUT('READLINE', 1) :S(OK)
- OUTPUT = 'Could not find file' :(END)
- OK . . .
-
- To read a line from the file, we simply use READLINE in a
- statement. The statement fails when the End-of-File is read:
-
- LINE = READLINE :F(END.OF.FILE)
-
-
-
-
-
-
- Tutorial - 24 - Input/Output and Keywords
-
-
-
-
-
- Each file-associated variable will have a line length associ-
- ated with it (80 characters unless SNOBOL4 is told otherwise in
- the length argument). Normally, reading stops at each end-of-
- line character (carriage return). If more than the line length
- has been read, the extra characters are discarded. If a short
- line is encountered, SNOBOL4 pads the line with blanks to produce
- the full line length. The end-of-line character is not included
- in the string returned.
-
- Blank padding is another historic feature from the days when
- most input was on punch cards. The next section, "Keywords,"
- will show you how to disable it. You can also use the TRIM func-
- tion to remove superfluous trailing blanks. The previous state-
- ment then becomes:
-
- LINE = TRIM(READLINE) :F(END.OF.FILE)
-
- When READLINE encounters the End-of-File, its failure signal is
- propagated outward, causing function TRIM to fail. This failure
- is detected in the GOTO field in the usual manner.
-
-
- 5.1.3 Output
-
- This function associates a variable with data written to a
- file. If the file does not exist, it is created. If it already
- exists, its previous contents are discarded.
-
- OUTPUT('variable', unit, length, 'file')
-
- The function succeeds and returns the null string if the file
- was successfully opened for output, and fails otherwise.
-
- We write data to the file by assigning it to the associated
- variable. In this example, we will use a variable called PRINT,
- and the DOS device PRN: with a line length of 132 characters:
-
- OUTPUT('PRINT', 2, 132, 'PRN:') :S(PRTOK)
- OUTPUT = 'Could not attach printer' :(END)
- PRTOK PRINT = 'Text Listing - ' DATE()
- . . .
-
- If the string assigned to an output variable is longer than the
- line length, SNOBOL4 will output as many lines as necessary of
- the standard line length to accommodate the string. SNOBOL4 sup-
- plies the carriage return and line feed characters at the end of
- each line.
-
- Once again, the output file name could be given on the command
- line (/2=PRN:). The function call would then look like this:
-
- OUTPUT('PRINT', 2, 132) :S(PRTOK)
- . . .
-
-
-
-
- Tutorial - 25 - Input/Output and Keywords
-
-
-
-
-
- 5.1.4 Changing I/O Defaults
-
- Having INPUT and OUTPUT associated with the keyboard and screen
- may be altered in the SNOBOL4 command line. A surprising number
- of programs can be written this way, using only the variables
- INPUT and OUTPUT for I/O. The command line phrase /I=FILENAME,
- associates INPUT with the named file, and /O=FILENAME does the
- same for OUTPUT. SNOBOL4 makes all the associations for you; no
- call to the INPUT or OUTPUT function is required.
-
- SNOBOL4 also provides the pre-associated variable SCREEN.
- Using SCREEN allows your program to post messages to the display
- even if OUTPUT has been redirected elsewhere.
-
- If we have a program written in terms of variables INPUT and
- OUTPUT, it can be run without alteration with different data
- files. For example, the following program will copy INPUT to
- OUTPUT, and place the line length and a blank in front of each
- line:
-
- LOOP S = TRIM(INPUT) :F(END)
- OUTPUT = SIZE(S) ' ' S :(LOOP)
- END
-
- Suppose we associate file TEXT.IN with INPUT, and TEXT.OUT with
- OUTPUT. We've supplied the morning song from Shakespeare's
- Cymbeline in file TEXT.IN, and the program above in file
- LENGTH.SNO. You can run it like this:
-
- B>SNOBOL4 LENGTH /I=TEXT.IN /O=TEXT.OUT
-
- Vanilla SNOBOL4 Version 2.14.
- (c) Copyright 1984,1988 Catspaw, Inc. All Rights Reserved.
-
- No errors
-
- B>TYPE TEXT.OUT
- 44 Hark! hark! the lark at heaven's gate sings,
- . . .
-
- SNOBOL4 will supply the default file name extensions .IN and
- .OUT for the /I and /O options, so the command line could be
- shortened to:
-
- B>SNOBOL4 LENGTH /I=TEXT /O=TEXT
-
-
-
-
-
-
-
-
-
-
-
-
- Tutorial - 26 - Input/Output and Keywords
-
-
-
-
-
- 5.2 KEYWORDS
-
- Input/Output allows your program to communicate with the out-
- side world. Your program may also communicate with the SNOBOL4
- system itself. Keywords allow you to modify SNOBOL4's behavior,
- and to obtain information from the system. A keyword consists of
- the ampersand character (&) followed by an alphabetic name. They
- are used in a statement in the same way as a variable. They
- either provide values or have values assigned to them. Numeric
- keywords are restricted to integer values.
-
- -----------------------------------------------------------------
-
- &TRIM Remove trailing blanks
-
- Normally, short lines read from a file are padded with blank
- characters to the standard line length. In LENGTH.SNO, we used
- the function TRIM(INPUT) to remove those blanks. A simpler
- method assigns an integer value to keyword &TRIM to control
- padding. If &TRIM is set to a nonzero value, blanks are not
- appended, and any trailing blanks are removed. A statement to do
- this looks like this:
-
- &TRIM = 1
-
- Since trailing blanks are usually not desired, you'll often see
- this statement at the beginning of many SNOBOL4 programs.
-
- -----------------------------------------------------------------
-
- &MAXLNGTH Maximum string length
-
- This keyword controls the maximum permissible string length.
- Its initial value is 5000, but it may be set to any positive
- integer from 0 to 32767. Setting it to 0 is going to severly re-
- strict what you can do, since only the null string will be avail-
- able to you!
-
- -----------------------------------------------------------------
-
- &DUMP Termination dump of variables
-
- This keyword is useful for debugging programs because it tells
- SNOBOL4 to display the values of your variables when your program
- terminates. Setting &DUMP to a positive, nonzero integer causes
- the variable names to be sorted alphabetically. A negative inte-
- ger produces a unsorted dump. Zero is the default value, inhibi-
- ting the dump. Only variables with nonnull values are displayed.
-
-
-
-
-
-
-
-
-
- Tutorial - 27 - Input/Output and Keywords
-
-
-
-
-
- -----------------------------------------------------------------
-
- &ALPHABET Complete character set
-
- This keyword contains a 256 character string, the computer's
- entire character set in ascending sequence. It is called a pro-
- tected keyword because it cannot be modified by your program.
-
- -----------------------------------------------------------------
-
- &LCASE Lower case letters
-
- This keyword contains the 26 lower case alphabetic characters,
- "abcdefghijklmnopqrstuvwxyz".
-
- -----------------------------------------------------------------
-
- &UCASE Upper case letters
-
- This keyword contains the 26 upper case alphabetic characters,
- "ABCDEFGHIJKLMNOPQRSTUVWXYZ".
-
-
- 5.3 PROGRAMS WITHOUT PATTERN MATCHING
-
- You now have the ingredients to create some simple programs.
- However, if this were all of the SNOBOL4 language, there would be
- very little reason to use it. We'll get to pattern matching
- shortly, where you'll find many new, challenging concepts.
- First, however, you should be comfortable with the preceding
- material.
-
- Take a few minutes to examine and run the following programs.
-
-
- 5.3.1 File Counts - FCOUNTS.SNO
-
- This program counts the number of characters and lines in a
- file. Because real numbers are not available in Vanilla SNOBOL4,
- you should only use this program with input files smaller than
- 32,767 characters.
-
- &TRIM = 1
- CHARS = 0
- NEXTL CHARS = CHARS + SIZE(INPUT) :F(DONE)
- LINES = LINES + 1 :(NEXTL)
- DONE OUTPUT = CHARS ' characters'
- OUTPUT = +LINES ' lines read'
- END
-
-
-
-
-
-
-
-
- Tutorial - 28 - Input/Output and Keywords
-
-
-
-
-
- In such a small program, it's permissible to rely upon the fact
- that the system initializes LINES to the null string. The first
- use of the statement:
-
- LINES = LINES + 1 :(NEXTL)
-
- converts LINES from the null string to an integer value. We used
- the expression +LINES in the last statement to produce an integer
- 0 (instead of the null string), if the input file were empty. To
- count the characters and lines in a file, use the /I= option, as
- in:
-
- B>SNOBOL4 FCOUNTS.SNO /I=TEXT.IN
-
-
- 5.3.2 Formatting Text - TRIPLET.SNO
-
- This program reformats a file by centering the lines and ar-
- ranging them in groups of three. Note that statements containing
- an asterisk in column one are considered comments by SNOBOL4.
-
- * Trim input, count input lines:
- &TRIM = 1
- N = 0
-
- * Read next input line, all done if End-of-File.
- LOOP S = INPUT :F(END)
-
- * Precede with blanks to center within 80 character line:
- OUTPUT = DUPL(' ', (80 - SIZE(S)) / 2) S
-
- * Increment count, but reset to 0 every third line.
- * Also, output a blank line when count resets:
- N = REMDR(N + 1, 3)
- OUTPUT = EQ(N, 0) :(LOOP)
- END
-
- This program uses the DUPL function to produce the leading
- blanks required to center a line. A simple calculation based on
- each line's width determines the number of blanks needed.
-
- The last two statements break the file lines into triplets.
- The REMDR function returns the integer remainder (modulus) when
- the first argument is divided by the second. In this case,
- assigning the result to variable N causes N to continually cycle
- through the values 0, 1, 2, 0, 1, .... When N is 0, the last
- statement assigns the null string to OUTPUT, producing a blank
- line. If N is 1 or 2, EQ fails, and the assignment fails.
-
- Try running the program with the sample text file:
-
- B>SNOBOL4 TRIPLET /I=TEXT
-
-
-
-
-
- Tutorial - 29 - Input/Output and Keywords
-
-
-
-
-
- Chapter 6
-
-
- PATTERN MATCHING
- -----------------------------------------------------------------
-
-
- 6.1 INTRODUCTION
-
- Pattern matching examines a "subject" string for some combina-
- tion of characters, called a "pattern." The matching process may
- be very simple, or extremely complex. For example:
-
- 1. The subject contains several color names. The pattern is
- the string "BLUE". Does the subject string contain the word
- "BLUE"?
-
- 2. The subject contains a nucleic acid (DNA) sequence. The
- pattern searches for a subsequence that is replicated in two
- other places in the string.
-
- 3. The subject contains a paragraph of English text. The
- pattern describes the spacing rules to be applied after
- punctuation. Does the subject string conform to the
- punctuation rules?
-
- 4. The subject string represents the current board position in
- a game of Tick-Tack-Toe. The pattern examines this string
- and determines the next move.
-
- 5. The subject contains a program statement from a prototype
- computer language. The pattern contains the grammar of that
- language. Is the statement properly formed according to the
- grammar?
-
- Most programming languages provide rudimentary facilities to
- examine a string for a specific character sequence. SNOBOL4 pat-
- terns are far more powerful, because they can specify complex
- (and convoluted) interrelationships. The colors of a painting,
- the words of a sentence, the notes of a musical score have lim-
- ited significance in isolation. It is their "relationship" with
- one another which provides meaning to the whole. Likewise,
- SNOBOL4 patterns can specify "context;" they may be qualified by
- what precedes or follows them, or by their position in the
- subject.
-
-
- 6.1.1 Knowns and Unknowns
-
- Patterns are composed of "known" and "unknown" components.
-
- "Knowns" are specific character strings, such as the string
- "BLUE" in the first example above. We are looking for a yes/no
- answer to the question: "Does this known item appear in the sub-
-
-
-
- Tutorial - 30 - Pattern Matching
-
-
-
-
-
- ject string?"
-
- "Unknowns" specify the "kind" of subject characters we are
- looking for; the specific characters are not identifiable in
- advance. We might want to match only characters from a
- restricted alphabet, or any substring of a certain length, or
- some arbitrary number of repetitions of a string. If the pattern
- matches, we can then "capture" the particular subject substring
- matched.
-
-
- 6.2 SPECIFYING PATTERN MATCHING
-
- A pattern match requires a subject string and a pattern. The
- subject is the first statement element after the label field (if
- any). The pattern appears next, separated from the subject by
- white space (blank or tab). If SUBJECT is the subject string,
- and PATTERN is the pattern, it looks like this:
-
- label SUBJECT PATTERN
-
- The pattern match "succeeds" if the pattern is found in the
- subject string; otherwise it fails. This success or failure may
- be tested in the GOTO field:
-
- label SUBJECT PATTERN :S(label1) F(label2)
-
- A real point of confusion is the distinction between pattern
- matching and concatenation. How do you tell the difference?
- Where does the subject end and the pattern begin? In this case,
- parentheses should be placed around the subject, since SNOBOL4
- always uses the first "complete" statement element as the
- subject. In the statement
-
- X Y Z
-
- X is the subject, and Y concatenated with Z is the pattern.
- Whereas
-
- (X Y) Z
-
- indicates the subject is string X concatenated with string Y, and
- the pattern is Z.
-
-
- 6.3 SUBJECT STRING
-
- The subject string may be a literal string, a variable, or an
- expression. If it is not a string, its string equivalent will be
- produced before pattern matching begins. For example, if the
- subject is the integer 48, integer to string conversion produces
- the character string "48". Remember, if the subject includes
- concatenated elements, they should be enclosed in parentheses.
-
-
-
-
- Tutorial - 31 - Pattern Matching
-
-
-
-
-
- 6.4 PATTERN SUBSEQUENTS AND ALTERNATES
-
- Arithmetic expressions are composed of elements and simpler
- subexpressions. Similarly, patterns are composed of simpler sub-
- patterns which are joined together as "subsequents" and "alter-
- nates." If P1 and P2 are two subpatterns, the expression
-
- P1 P2
-
- is also a pattern. The subject must contain whatever P1 matches,
- immediately followed by whatever P2 matches. P2 is the "subse-
- quent" of P1. The white space (blank or tab) between P1 and P2
- is the same binary concatenation operator previously used to join
- strings; its use with patterns is completely analogous. The pre-
- ceding pattern matches pattern P1 "followed by pattern" P2.
-
- The binary "alternation" operator is the vertical bar (|). As
- it is a binary operator, it must have white space on each side.
- The pattern
-
- P1 | P2
-
- matches whatever P1 matches, "or" whatever P2 matches. SNOBOL4
- tries the various alternatives from left to right.
-
- Normally, concatenation is performed before alternation, so the
- pattern
-
- P1 | P2 P3
-
- matches P1 alone, or P2 "followed by" P3. Parentheses can be
- used to alter the grouping of subpatterns. For example:
-
- (P1 | P2) P3
-
- matches P1 "or" P2, followed by P3.
-
- When a pattern successfully matches a portion of the subject,
- the matching subject characters are "bound" to it. The next pat-
- tern in the statement must match beginning with the very next
- subject character. If a subsequent fails to match, SNOBOL4 back-
- tracks, unbinding patterns until another alternative can be
- tried. A pattern match fails when SNOBOL4 cannot find an alter-
- native that matches.
-
- The null string may appear in a pattern. It always matches,
- but does not bind any subject characters. We can think of it as
- matching the invisible space "between" two subject characters.
- One possible use is as the last of a series of alternatives. For
- example, the pattern
-
- ROOT ('S' | 'ES' | '')
-
- matches the pattern in ROOT, with an optional suffix of 'S' or
-
-
-
- Tutorial - 32 - Pattern Matching
-
-
-
-
-
- 'ES'. If ROOT matches, but is not followed by 'S' or 'ES', the
- null string matches and successfully completes the clause. Its
- presence gives the pattern match a successful escape.
-
- The conditional functions of the previous chapter may appear in
- patterns. If they fail when evaluated, the current alternative
- fails. If they succeed, they match the null string, and so do
- not consume any subject characters. They behave like a gate,
- allowing the match to proceed beyond them only if they are true.
- This pattern will match 'FOX' if N is 1, or 'WOLF' if N is 2:
-
- EQ(N,1) 'FOX' | EQ(N,2) 'WOLF'
-
- Parentheses may be used to factor a pattern. The strings
- 'COMPATIBLE', 'COMPREHENSIBLE', and 'COMPRESSIBLE' are matched by
- the pattern:
-
- 'COMP' ('AT' | 'RE' ('HEN' | 'S') 'S') 'IBLE'
-
-
- 6.5 SIMPLE PATTERN MATCHES
-
- Here are examples of pattern matches using a string literal or
- variable for the subject. The patterns consist entirely of known
- elements. Use the CODE.SNO program to experiment with them:
-
- ? 'BLUEBIRD' 'BIRD'
- Success
- ? 'BLUEBIRD' 'bird'
- Failure
- ? B = 'THE BLUEBIRD'
- ? B 'FISH'
- Failure
- ? B 'FISH' | 'BIRD'
- Success
- ? B ('GOLD' | 'BLUE') ('FISH' | 'BIRD')
- Success
-
- The first statement shows that the matching substring ('BIRD')
- need not begin at the start of the subject string. This is
- called "unanchored" matching. The second statement fails because
- strings are case sensitive, unlike names and labels. The third
- statement creates a variable to be used as the subject. The
- fifth statement employs an alternate: we are matching for 'FISH'
- or 'BIRD'.
-
- The last statement uses subsequents and alternates. We are
- looking for a substring in B that contains 'GOLD' or 'BLUE', fol-
- lowed by 'FISH' or 'BIRD'. It will match 'GOLDFISH', 'GOLDBIRD',
- 'BLUEFISH' or 'BLUEBIRD'. If the parentheses were omitted, con-
- catenation of 'BLUE' and 'FISH' would be performed before alter-
- nation, and the pattern would match 'GOLD', 'BLUEFISH', or
- 'BIRD'.
-
-
-
-
- Tutorial - 33 - Pattern Matching
-
-
-
-
-
- 6.6 THE PATTERN DATA TYPE
-
- If we execute the statement
-
- ? COLOR = 'BLUE'
-
- the variable COLOR contains the string 'BLUE', and could appear
- in the pattern portion of a statement:
-
- ? B COLOR
- Success
-
- Even though it is used as a pattern, COLOR has the "string"
- data type. However, complicated patterns may be stored in a
- variable just like a string or numeric value. The statement
-
- ? COLOR = 'GOLD' | 'BLUE'
-
- will create a "structure" describing the pattern, and store it in
- the variable COLOR. COLOR now has the "pattern" data type. The
- preceding example can now be written as:
-
- ? CRITTER = 'FISH' | 'BIRD'
- ? BOTH = COLOR CRITTER
- ? B BOTH
- Success
-
-
- 6.7 CAPTURING MATCH RESULTS
-
- If the pattern match
-
- B BOTH
-
- succeeds, we may want to know which of the many pattern alterna-
- tives were used in the match. The binary operator "conditional
- assignment" assigns the matching subject substring to a variable.
- The operator is called conditional, because assignment occurs
- ONLY if the entire pattern match is successful. Its graphic
- symbol is a period (.). It assigns the matching substring on its
- left to the variable on its right. Note that the direction of
- assignment is just the opposite of the statement assignment oper-
- ator (=). Continuing with the previous example, we'll redefine
- COLOR and CRITTER to use conditional assignment:
-
- ? COLOR = ('GOLD' | 'BLUE') . SHADE
- ? CRITTER = ('FISH' | 'BIRD') . ANIMAL
- ? BOTH = COLOR CRITTER
- ? B BOTH
- Success
- ? OUTPUT = SHADE
- BLUE
- ? OUTPUT = ANIMAL
- BIRD
-
-
-
- Tutorial - 34 - Pattern Matching
-
-
-
-
-
- The substrings that match the subpatterns COLOR and CRITTER are
- assigned to SHADE and ANIMAL respectively. The statement
-
- BOTH = COLOR CRITTER
-
- had to be reexecuted because its previous execution captured the
- old values of COLOR and CRITTER, without the conditional assign-
- ment operators. The redefinition of COLOR and CRITTER was not
- reflected in BOTH until the statement was reexecuted.
-
- Conditional assignment may appear at any level of pattern nest-
- ing, and may include other conditional assignments within its
- embrace. The pattern
-
- (('B' | 'F' | 'N') . FIRST 'EA' ('R' | 'T') . LAST) . WORD
-
- matches 'BEAR', 'FEAR', 'NEAR', 'BEAT', 'FEAT', or 'NEAT',
- assigning the first letter matched to FIRST, the last letter to
- LAST, and the entire result to WORD.
-
- The variable OUTPUT may be used as the target of conditional
- assignment. Try:
-
- ? 'B2' ('A' | 'B') . OUTPUT (1 | 2 | 3) . OUTPUT
- B
- 2
- Success
-
-
- 6.8 UNKNOWNS
-
- All of the previous examples used patterns created from literal
- strings. We may also want to specify the "qualities" of a match
- component, rather than its specific characters. Using unknowns
- greatly increases the power of pattern matching. There are two
- types, primitive patterns and pattern functions.
-
-
- 6.8.1 Primitive Patterns
-
- There are seven primitive patterns built into the SNOBOL4
- system. The two used most frequently will be discussed here.
- Chapter 9, "Advanced Topics," introduces the remaining five.
-
- -----------------------------------------------------------------
-
- REM Match remainder of subject
-
- REM is short for the REMainder pattern. It will match zero or
- more characters at the end of the subject string. Try the
- following:
-
-
-
-
-
-
- Tutorial - 35 - Pattern Matching
-
-
-
-
-
- ? 'THE WINTER WINDS' 'WIN' REM . OUTPUT
- TER WINDS
- Success
-
- The subpattern 'WIN' matched its first occurrence in the sub-
- ject, at the beginning of the word 'WINTER'. REM matched from
- there to the end of the subject string---the characters 'TER
- WINDS'---and assigned them to the variable OUTPUT. If we change
- the pattern slightly, to:
-
- ? 'THE WINTER WINDS' 'WINDS' REM . OUTPUT
-
- Success
-
- then 'WINDS' matches at the end of the subject string, leaving a
- null remainder for REM. REM matches this null string, assigns it
- to OUTPUT, and a blank line is displayed.
-
- The pattern components to the left of REM must successfully
- match some portion of the subject string. REM begins where they
- left off, matching all subject characters through the end of
- string. There are no restrictions on the particular characters
- matched.
-
- -----------------------------------------------------------------
-
- ARB Match arbitrary characters
-
- ARB matches an ARBitrary number of characters from the subject
- string. It matches the shortest possible substring, including
- the null string. The pattern components on either side of ARB
- determine what is matched. Try the statements
-
- ? 'MOUNTAIN' 'O' ARB . OUTPUT 'A'
- UNT
- Success
- ? 'MOUNTAIN' 'O' ARB . OUTPUT 'U'
-
- Success
-
- In the first statement, the ARB pattern is constrained on
- either side by the known patterns 'O' and 'A'. ARB expands to
- match the subject characters between, 'UNT'. In the second
- statement, there is nothing between 'O' and 'U', so ARB matches
- the null string. ARB behaves like a spring, expanding as needed
- to fill the gap defined by neighboring patterns.
-
-
- 6.8.2 Cursor Position
-
- During a pattern match, the "cursor" is SNOBOL4's pointer into
- the subject string. It is integer valued, and points "between"
- two subject characters. The cursor is set to zero when a pattern
- match begins, corresponding to a position immediately to the left
-
-
-
- Tutorial - 36 - Pattern Matching
-
-
-
-
-
- of the first subject character. As the pattern match proceeds,
- the cursor moves right and left across the subject to indicate
- where SNOBOL4 is attempting a match. The value of the cursor
- will be used by some of the pattern functions that follow.
-
- The "cursor position" operator assigns the current cursor value
- to a variable. It is a unary operator whose graphic symbol is
- the "at sign" (@). It appears within a pattern, preceding the
- name of a variable. By using OUTPUT as the variable, we can
- display the cursor position on the screen. For instance:
-
- ? 'VALLEY' 'A' @OUTPUT ARB 'E' @OUTPUT
- 2
- 5
- Success
- ? 'DOUBT' @OUTPUT 'B'
- 0
- 1
- 2
- 3
- Success
- ? 'FIX' @OUTPUT 'B'
- 0
- 1
- 2
- Failure
-
- Cursor assignment is performed whenever the pattern match
- encounters the operator, including retries. It occurs even if
- the pattern ultimately fails. The element @OUTPUT behaves like
- the null string---it doesn't consume subject characters or inter-
- fere with the match in any way.
-
-
- 6.8.3 Integer Pattern Functions
-
- These functions return a pattern based on their integer argu-
- ment. The pattern produced can be used directly in a pattern
- match statement, or stored in a variable for later retrieval.
-
- -----------------------------------------------------------------
-
- LEN(integer) Match fixed-length string
-
- LEN(I) produces a pattern which matches a string exactly I
- characters long. I must be an integer greater than or equal to
- zero. Any characters may appear in the matched string. For
- example, LEN(5) matches any 5-character string, and LEN(0)
- matches the null string. LEN may be constrained to certain por-
- tions of the subject by other adjacent patterns:
-
-
-
-
-
-
-
- Tutorial - 37 - Pattern Matching
-
-
-
-
-
- ? S = 'ABCDA'
- ? S LEN(3) . OUTPUT
- ABC
- ? S LEN(2) . OUTPUT 'A'
- CD
-
- The first pattern match had only one constraint---the subject
- had to be at least three characters long---so LEN(3) matched its
- first three characters. The second case imposes the additional
- restriction that LEN(2)'s match be followed immediately by the
- letter 'A'. This disqualifies the intermediate match attempts
- 'AB' and 'BC'.
-
- Using keyword &ALPHABET as the subject provides a simple way to
- convert a decimal character code between 0 and 255 to its one
- character equivalent. For example, by consulting an ASCII char-
- acter code chart we find that the BEL character is decimal 7. We
- can load that character into variable BEEP with one statement:
-
- ? &ALPHABET LEN(7) LEN(1) . BEEP
-
- and produce five beeps on the speaker with:
-
- ? OUTPUT = DUPL(BEEP,5)
-
- &ALPHABET contains all 256 members of the computer's character
- set, in ascending order. LEN(7) matches the first seven charac-
- ters (codes 0 - 6), leaving BEL as the next match position for
- LEN(1). This operation is analogous to the CHR$ function in
- BASIC.
-
- The inverse operation, obtaining the numerical value of a char-
- acter code, is also possible. If variable CHAR contains a one
- character string, variable N will be set to its decimal equiva-
- lent with the second statement below:
-
- ? CHAR = 'A'
- ? &ALPHABET @N CHAR
- ? OUTPUT = N
- 65
-
- In Chapter 8, "Program Defined Objects," I'll demonstrate how
- you can define your own functions to encapsulate each of these
- operations.
-
- -----------------------------------------------------------------
-
- POS(integer), RPOS(integer) Verify cursor position
-
- The POS(I) and RPOS(I) patterns do not match subject charac-
- ters. Instead, they succeed only if the "current" cursor posi-
- tion is a specified value. They often are used to tie points of
- the pattern to specific character positions in the subject.
-
-
-
-
- Tutorial - 38 - Pattern Matching
-
-
-
-
-
- POS(I) counts from the left end of the subject string, succeed-
- ing if the current cursor position is equal to I. RPOS(I) is
- similar, but counts from the right end of the subject. If the
- subject length is N characters, RPOS(I) requires the cursor be
- (N - I). If the cursor is not the correct value, these functions
- fail, and SNOBOL4 tries other pattern alternatives, perhaps
- extending a previous substring matched by ARB, or beginning the
- match further along in the subject.
-
- Continuing with CODE.SNO:
-
- ? S = 'ABCDA'
- ? S POS(0) 'B'
- Failure
- ? S LEN(3) . OUTPUT RPOS(0)
- CDA
- ? S POS(3) LEN(1) . OUTPUT
- D
- ? S POS(0) 'ABCD' RPOS(0)
- Failure
-
- The first example requires a 'B' at cursor position 0, and
- fails for this subject. POS(0) "anchors" the match, forcing it
- to begin with the first subject character. Similarly, RPOS(0)
- anchors the end of the pattern to the tail of the subject. The
- next example matches at a specific mid-string character position,
- POS(3). Finally, enclosing a pattern between POS(0) and RPOS(0)
- forces the match to use the ENTIRE subject string.
-
- At first glance these functions appear to be "setting" the
- cursor to a specified value. Actually, they never alter the
- cursor, but instead wait for the cursor to "come to them" as
- various match alternatives are attempted. This, in turn, allows
- other patterns in the statement to be processed in an orderly
- fashion. You can demonstrate this "waiting for the cursor"
- behavior like this:
-
- ? S @OUTPUT POS(3)
- 0
- 1
- 2
- 3
- Success
-
- -----------------------------------------------------------------
-
- TAB(integer), RTAB(integer) Match to fixed position
-
- These patterns are hybrids of ARB, POS(), and RPOS(). They use
- specific cursor positions, like POS and RPOS, but bind (match)
- subject characters, like ARB. TAB(I) matches any characters from
- the current cursor position up to the specified position I.
- RTAB(I) does the same, except, as in RPOS(), the target position
- is measured from the end of the subject.
-
-
-
- Tutorial - 39 - Pattern Matching
-
-
-
-
-
- TAB and RTAB will match the null string, but will fail if the
- current cursor is to the right of the target. They also fail if
- the target position is past the end of the subject string.
-
- These patterns are useful when working with tabular data. For
- example, if a data file contains name, street address, city and
- state in columns 1, 30, 60, and 75, this pattern will break out
- those elements from a line:
-
- P = TAB(29) . NAME TAB(59) . STREET TAB(74) . CITY REM . STATE
-
- The pattern RTAB(0) is equivalent to primitive pattern REM.
- One potential source of confusion is just what it is that RTAB
- matches. It counts from the right end of the subject, but
- matches to the left of its target cursor. Try:
-
- ? 'ABCDE' TAB(2) . OUTPUT RTAB(1) . OUTPUT
- AB
- CD
- Success
-
- TAB(2) matches 'AB', leaving the cursor at 2, between 'B' and
- 'C'. The subject is 5 characters long, so RTAB(1) specifies a
- target cursor of 5 - 1, or 4, which is between the 'D' and 'E'.
- RTAB matches everything from the current cursor, 2, to the
- target, 4.
-
-
- 6.8.4 Character Pattern Functions
-
- These functions produce a pattern based on a string-valued
- argument. Once again, the pattern may be used directly or stored
- in a variable.
-
- -----------------------------------------------------------------
-
- ANY(string), NOTANY(string) Match one character
-
- Each function produces a pattern which matches one character
- based upon the subject string. ANY(S) matches the next subject
- character IF it appears in the string S, and fails otherwise.
- NOTANY(S) matches a subject character only if it does NOT appear
- in S. Here are some sample uses of each:
-
- ? VOWEL = ANY('AEIOU')
- ? DVOWEL = VOWEL VOWEL
- ? NOTVOWEL = NOTANY('AEIOU')
- ? 'VACUUM' VOWEL . OUTPUT
- A
- ? 'VACUUM' DVOWEL . OUTPUT
- UU
- ? 'VACUUM' (VOWEL NOTVOWEL) . OUTPUT
- AC
-
-
-
-
- Tutorial - 40 - Pattern Matching
-
-
-
-
-
- The argument string specifies a set of characters to be used in
- creating the ANY or NOTANY pattern. It may contain duplicate
- characters, and the order of characters in S is immaterial.
-
- -----------------------------------------------------------------
-
- SPAN(string), BREAK(string) Match a run of characters
-
- These are multicharacter versions of ANY and NOTANY. Each
- requires a nonnull argument to specify a set of characters.
-
- SPAN(S) matches one or more subject characters from the set in
- S. SPAN must match at least one subject character, and will
- match the LONGEST subject string possible.
-
- BREAK(S) matches "up to but not including" any character in S.
- The string matched must always be followed in the subject by a
- character in S. Unlike SPAN and NOTANY, BREAK will match the
- null string.
-
- These two functions are called "stream" functions because each
- streams by a series of subject characters. SPAN is most useful
- for matching a group of characters with a common trait. For
- example, we can say an English word is composed of one or more
- alphabetic characters, apostrophe, and hyphen. The statements
-
- ? LETTERS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ'-"
- ? WORD = SPAN(LETTERS)
-
- produce a suitable pattern in WORD. To match the material
- between words (white space, punctuation, etc.), use the pattern:
-
- ? GAP = BREAK(LETTERS)
-
- SPAN and BREAK are two of the most useful SNOBOL4 functions.
- Try some examples using CODE.SNO:
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Tutorial - 41 - Pattern Matching
-
-
-
-
-
- ? 'SAMPLE LINE' WORD . OUTPUT
- SAMPLE
- ? 'PLUS TEN DEGREES' ' ' WORD . OUTPUT
- TEN
- ? GAPO = GAP . OUTPUT
- ? WORDO = WORD . OUTPUT
- ? ': ONE, TWO, THREE' GAPO WORDO GAPO WORDO
- :
- ONE
- ,
- TWO
- DIGITS = '0123456789'
- ? INTEGER = (ANY('+-') | '') SPAN(DIGITS)
- ? 'SET -43 VOLTS' INTEGER . OUTPUT
- -43
- ? REAL = INTEGER '.' (SPAN(DIGITS) | '')
- ? 'SET -43.625 VOLTS' REAL . OUTPUT
- -43.625
- ? S = '0ZERO,1ONE,2TWO,3THREE,4FOUR,5FIVE,'
- ? S 4 BREAK(',') . OUTPUT
- FOUR
-
- If you require a version of SPAN which WILL match the null
- string, or a BREAK which will NOT match the null string, you can
- use the following constructions:
-
- (SPAN(S) | '')
- (NOTANY(S) BREAK(S))
-
- We need to introduce one more fundamental concept---replace-
- ment---before we can write some meaningful programs.
-
-
- 6.9 PATTERN MATCHING WITH REPLACEMENT
-
- Pattern matching identifies a subject substring with a particu-
- lar trait, specified by the pattern. We used conditional assign-
- ment to copy that substring to a variable. Replacement moves in
- the other direction, letting you alter the substring in the sub-
- ject. The space occupied by the matching substring may be en-
- larged or contracted (or removed entirely), leaving adjacent sub-
- ject characters undisturbed. If the pattern matched the entire
- subject, replacement behaves like a simple assignment statement.
-
- Replacement appears in a form similar to assignment:
-
- SUBJECT PATTERN = REPLACEMENT
-
- First, the pattern match is attempted on the subject. If it
- fails, execution of the statement ends immediately, and replace-
- ment does not occur. If the match succeeds, any conditional
- assignments within the pattern are performed. The replacement
- field is then evaluated, converted to a string, and inserted in
- the subject in place of the matching substring. If the replace-
-
-
-
- Tutorial - 42 - Pattern Matching
-
-
-
-
-
- ment field is empty, the null string replaces the matched sub-
- string, effectively deleting it. Try a few examples with
- CODE.SNO:
-
- ? T = 'MUCH ADO ABOUT NOTHING'
- ? T 'ADO' = 'FUSS'
- Success
- ? OUTPUT = T
- MUCH FUSS ABOUT NOTHING
- ? T 'NOTHING' =
- Success
- ? OUTPUT = T
- MUCH FUSS ABOUT
- ? 'MASH' 'M' = 'B'
- Execution error #8, Variable not present where required
- Failure
-
- The first replacement searches for 'ADO' in the subject string,
- replacing it with 'FUSS'. The second replacement has a null
- string replacement value, and deletes the matching substring.
- The last example demonstrates that a variable must be the subject
- of replacement. Variables can be changed; string literals---like
- 'MASH'---cannot.
-
- The following will replace the 'M' in 'MASH' with a 'B':
-
- ? VERB = 'MASH'
- ? VERB 'M' = 'B'
- ? OUTPUT = VERB
- BASH
-
- If the matched substring appears more than once in the subject,
- only the first occurrence is changed. The remaining substrings
- must be found with a program loop. For example, a statement to
- eliminate all occurrences of the letter 'A' from the subject
- looks like this:
-
- ALOOP SUBJECT 'A' = :S(ALOOP)
-
- Here ALOOP is the statement label, SUBJECT is some variable
- containing the subject string, 'A' is the pattern, and the
- replacement field is empty. If an 'A' is found, it is deleted by
- replacing it with the null string, and the statement succeeds.
- The success GOTO branches back to ALOOP, and another search for
- 'A' is performed. The loop continues until no 'A's remain in the
- subject, and the pattern match fails. Of course, the pattern and
- replacement can be as complex as desired.
-
- Simple loops like this can be tried in CODE.SNO by appending a
- semicolon after the GOTO field. (Semicolon is used with GOTO in
- CODE.SNO only; you would not use it in normal programs.) Contin-
- uing with the previous example:
-
-
-
-
-
- Tutorial - 43 - Pattern Matching
-
-
-
-
-
- ? VOWEL = ANY('AEIOU')
- ?VL T VOWEL = '*' :S(VL);
- ? OUTPUT = T
- M*CH F*SS *B**T
-
- Since conditional assignment is performed before replacement,
- its results are available for use in the replacement field of the
- same statement. Here's an example of removing the first item
- from a list, and placing it on the end:
-
- ? RB = 'RED,ORANGE,YELLOW,GREEN,BLUE,INDIGO,VIOLET,'
- ? CYCLE = BREAK(',') . ITEM LEN(1) REM . REST
- ? RB CYCLE = REST ITEM ','
- Success
- ? OUTPUT = ITEM
- RED
- ? OUTPUT = RB
- ORANGE,YELLOW,GREEN,BLUE,INDIGO,VIOLET,RED,
-
- Pattern CYCLE matches the entire subject, placing the first
- color into ITEM, bypassing the comma with LEN(1), and placing the
- remainder of the subject into REST. REST and ITEM are then
- transposed in the replacement field, and stored back into RB.
-
-
- 6.10 SAMPLE PROGRAMS
-
- I've introduced a lot of concepts in this chapter; it's time to
- see how they fit together into programs. They're supplied on the
- Vanilla SNOBOL4 diskette.
-
-
- 6.10.1 Word Counting
-
- The first program counts the number of words in the input file.
- Lines with an asterisk in the first column are comment lines---
- their contents are ignored by SNOBOL4.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Tutorial - 44 - Pattern Matching
-
-
-
-
-
- * Simple word counting program, WORDS.SNO.
- *
- * A word is defined to be a contiguous run of letters,
- * digits, apostrophe and hyphen. This definition of
- * legal letters in a word can be altered for specialized
- * text.
- *
- * If the file to be counted is TEXT.IN, run this program
- * by typing:
- * B>SNOBOL4 WORDS /I=TEXT
- *
- &TRIM = 1
- WORD = "'-" '0123456789' &UCASE &LCASE
- WPAT = BREAK(WORD) SPAN(WORD)
-
- NEXTL LINE = INPUT :F(DONE)
- NEXTW LINE WPAT = :F(NEXTL)
- N = N + 1 :(NEXTW)
-
- DONE OUTPUT = +N ' words'
- END
-
- After defining the acceptable characters in a word, the real
- work of the program is performed in the three lines beginning
- with label NEXTL. A line is read from the input file, and stored
- in variable LINE. The next statement attempts to find the next
- word with pattern WPAT. BREAK streams by any blanks and punctua-
- tion, stopping just short of the word, which SPAN then matches.
- Both the word and any preceding punctuation are removed from LINE
- by replacement with the null string.
-
- When no more words remain in LINE, the failure transfer to
- NEXTL reads the next line. If the match succeeds, N is incre-
- mented, and the program goes back to NEXTW to search for another
- word. When the End-of-File is encountered, control transfers to
- DONE and the number of words is displayed.
-
- It's simple to alter pattern WPAT to search for other things.
- For instance, if we wanted to count occurrences of double vowels,
- we could use:
-
- WPAT = ANY('AEIOUaeiou') ANY('AEIOUaeiou')
-
- To count the occurrences of integers with an optional sign
- character, use:
-
- WPAT = (ANY('+-') | '') SPAN('0123456789')
-
- Perhaps we want to count violations of simple punctuation
- rules: period with only one blank, or comma and semicolon fol-
- lowed by more than one blank:
-
- WPAT = '. ' NOTANY(' ') | ANY(',;') ' ' SPAN(' ')
-
-
-
-
- Tutorial - 45 - Pattern Matching
-
-
-
-
-
- Notice how closely WPAT parallels the English language descrip-
- tion of the problem.
-
-
- 6.10.2 Word Crossing
-
- This program asks for two words, and displays all intersecting
- letters between them. For example, given the words LOOM and
- HOME, the program output is:
-
- H
- LOOM
- M
- E
-
- H
- LOOM
- M
- E
-
- H
- O
- LOOM
- E
-
- A pattern match like this would find the first intersecting
- character:
-
- HORIZONTAL ANY(VERTICAL) . CHAR
-
- However, we want to find all intersections, so will have to
- iterate our search. In conventional programming languages, we
- might use numerical indices to remember which combinations were
- tried. Here, we'll use place-holding characters like '*' and '#'
- to remove solutions from future consideration. As seems to be
- the case with SNOBOL4, there are more comments than program
- statements:
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Tutorial - 46 - Pattern Matching
-
-
-
-
-
- * CROSS.SNO - Print all intersections between two words
-
- &TRIM = 1
-
- * Get words from user
- *
- AGAIN OUTPUT = 'ENTER HORIZONTAL WORD:'
- H = INPUT :F(END)
-
- OUTPUT = 'ENTER VERTICAL WORD:'
- V = INPUT :F(END)
-
- * Make copy of horizontal word to track position.
- HC = H
-
- * Find next intersection in horizontal word. Save
- * the number of preceding horizontal characters in NH.
- * Save the intersecting character in CROSS.
- * Replace with '*' to remove from further consideration.
- * Go to AGAIN to get new words if horizontal exhausted.
- *
- NEXTH HC @NH ANY(V) . CROSS = '*' :F(AGAIN)
-
- * For each horizontal hit, iterate over possible
- * vertical ones. Make copy of vertical word to track
- * vertical position.
- *
- VC = V
-
- * Find where the intersection was in the vertical word.
- * Save the number of preceding vertical characters in NV.
- * Replace with '#' to prevent finding it again in that
- * position. When vertical exhausted, try horizontal again.
- *
- NEXTV VC @NV CROSS = '#' :F(NEXTH)
-
- * Now display this particular intersection.
- * We make a copy of the original vertical word,
- * and mark the intersecting position with '#'.
- *
- OUTPUT =
- PRINTV = V
- PRINTV POS(NV) LEN(1) = '#'
-
- * Peel off the vertical characters one-by-one. Each will
- * be displayed with NH leading blanks to get it in the
- * correct column. When the '#' is found, display the full
- * horizontal word instead.
- * When done, go to NEXTV to try another vertical position.
- *
- PRINT PRINTV LEN(1) . C = :F(NEXTV)
- OUTPUT = DIFFER(C,'#') DUPL(' ',NH) C :S(PRINT)
- OUTPUT = H :(PRINT)
- END
-
-
-
- Tutorial - 47 - Pattern Matching
-
-
-
-
-
- 6.11 ANCHORED AND UNANCHORED MATCHING
-
- Most of the examples above match substrings which do not begin
- at the first subject character. This is the "unanchored" mode of
- pattern matching. Alternately, we can "anchor" the pattern match
- by requiring it to include the first subject character. Setting
- keyword &ANCHOR to a nonzero value produces anchored matching.
- Anchored matching is usually faster than unanchored, because many
- futile attempts to match are eliminated.
-
- Even when the desired item is not at the beginning of the sub-
- ject, it is often possible to simulate anchored matching by pre-
- fixing the pattern with a subpattern which will stream out to the
- desired object. The stream function spans the gap from the first
- subject character to the desired item. Use CODE.SNO to experi-
- ment with &ANCHOR:
-
- ? DIGITS = '0123456789'
- ? &ANCHOR = 1
- ? 'THE NEXT 43 DAYS' BREAK(DIGITS) SPAN(DIGITS) . N
-
- This will assign substring '43' to N, even in anchored mode.
- In unanchored mode, the test lines:
-
- ? &ANCHOR = 0
- ? 'THE NEXT 43 DAYS' SPAN(DIGITS) . N
-
- would ultimately succeed, but only after SPAN failed on each of
- the characters preceding the '4'. The efficiency difference is
- more pronounced if the subject does not contain any digits. In
- the first formulation, BREAK(DIGITS) fails and the anchored match
- then fails immediately. The second construction fails only after
- SPAN is tried at each subject character position.
-
- When your program first begins execution, SNOBOL4 sets keyword
- &ANCHOR to zero, the unanchored mode. If you can construct all
- your patterns as anchored patterns, you should set &ANCHOR
- nonzero for anchored matching. Setting and resetting &ANCHOR
- throughout your program is error prone and not advised. Another
- alternative is to leave &ANCHOR set to 0, but to 'pseudo-anchor'
- patterns by using POS(0) as the first pattern element.
-
- It always takes less time for a pattern to succeed than to
- fail. Failure implies an exhaustive search of all combinations,
- whereas success stops the pattern match early. You should try to
- construct patterns with direct routes to success, such as the use
- of BREAK above. Wherever possible, impose restrictions on the
- number of alternatives to be tried. Combinatorial explosion is
- the price of loose pattern programming.
-
-
-
-
-
-
-
-
- Tutorial - 48 - Pattern Matching
-
-
-
-
-
- Chapter 7
-
-
- ADDITIONAL OPERATORS AND DATA TYPES
- -----------------------------------------------------------------
-
- In this chapter we will explore some additional SNOBOL4 opera-
- tors and data types. Many of these concepts are entirely absent
- from other programming languages. Far from being esoteric, they
- fit quite naturally into SNOBOL4, and add to its conciseness and
- power of expression. In the following examples, we will continue
- to use the CODE.SNO program to illustrate each new idea.
-
-
- 7.1 INDIRECT REFERENCE
-
- In conventional programming languages, a variable's name may be
- specified only at the time the program is written. In fact, once
- the run-time storage has been allocated, the textual form of the
- name can be discarded. This is not the case in SNOBOL4; you can
- create new variables during execution, and reference existing
- ones from names specified in character strings.
-
- The unary operator dollar sign ($) is the "indirect reference
- operator." By applying it to a variable you instruct SNOBOL4 to
- use its contents as the name of another variable, and to continue
- on to reference that variable. SNOBOL4 "goes through" the oper-
- and to reach the variable. Try the following simple example:
-
- ? DOG = 'BARK'
- ? CAT = 'MEOW'
- ? ANIMAL = 'CAT'
- ? OUTPUT = $ANIMAL
- MEOW
- ? ANIMAL = 'DOG'
- ? OUTPUT = $ANIMAL
- BARK
-
- These statements make their indirect reference through the
- string contained in variable ANIMAL. ANIMAL's contents are
- treated as a "pointer" to the final destination. That is, using
- ANIMAL by itself retrieves the string 'DOG', while $ANIMAL refers
- to the variable DOG.
-
- New variables may also be created by using an indirect refer-
- ence as the object of an assignment. Here, $DOG causes variable
- BARK to be created, and assigned the string 'RUFF':
-
- ? $DOG = 'RUFF'
- ? OUTPUT = BARK
- RUFF
-
- Indirect referencing may proceed to any depth, provided the
- null string is never encountered as a variable name:
-
-
-
- Tutorial - 49 - Operators and Data Types
-
-
-
-
-
- ? OUTPUT = $ANIMAL '-' $$ANIMAL
- BARK-RUFF
- ? OUTPUT = $RUFF
- Execution error #4, Null string in illegal context
-
- In the first example, $ANIMAL produces the contents of variable
- DOG, while $$ANIMAL refers to the variable BARK. The second ex-
- ample attempts to go through RUFF---which was not previously de-
- fined---and obtains the null string. Of course, the null string
- is not a valid variable name.
-
-
- 7.1.1 Associative Programming
-
- Indirect referencing provides a means of "programming by asso-
- ciation." Suppose we want to write a program that allows the
- user to enter a state name and receive the state's capital in
- response. We've provided a data file called CAPITAL.DAT, in
- which each line contains a state name, comma, and the capital.
- The first part of the program will read the file and set up an
- associative data base:
-
- * Trim input, attach data file to variable INFILE
- &TRIM = 1
- INPUT('INFILE', 1, , 'CAPITAL.DAT') :F(ERR)
-
- * Read a line from file. Start querying upon EOF
- READF LINE = INFILE :F(QUERY)
-
- * Break out state and capital from line
- LINE BREAK(',') . STATE LEN(1) REM . CAPITAL :F(ERR)
-
- * Convert state name into a variable, and assign the
- * capital city string to it. Then read next line.
- $STATE = CAPITAL :(READF)
-
- ERR OUTPUT = 'Illegal data file' :(END)
- QUERY . . .
-
- We attach the file, and associate variable INFILE with it.
- Successive file lines are read into variable LINE. Pattern
- matching assigns the state name and capital city to variables
- STATE and CAPITAL respectively. We use an indirect reference
- through $STATE to create a new variable with the state's name,
- and assign the capital city to it. For example, the file line
- 'COLORADO,DENVER' creates variable COLORADO, containing 'DENVER'.
-
- Having established a data base, completing the program to ac-
- cess it is trivial:
-
- * Read state name, access it as a variable
- QUERY OUTPUT = $INPUT :S(QUERY)
- END
-
-
-
-
- Tutorial - 50 - Operators and Data Types
-
-
-
-
-
- An input line is read from the user, and used for an indirect
- reference. If the user types a state name, treating it as a
- variable name obtains the state capital. An invalid state name
- would reference a new variable, whose value is the null string,
- and a blank line would be output. A more complete program might
- test for this null string and produce an error message.
-
- The addition of one statement to the program loop creating the
- data base allows us to enter either the state name or capital
- city, and obtain the other:
-
- $STATE = CAPITAL
- $CAPITAL = STATE :(READF)
-
- How would we solve this problem in a language like BASIC?
- States and capitals could be stored in an array. We would then
- use a loop to sequentially compare the user's input string with
- the array elements. If a match were found, the result would be
- displayed from another array element. In SNOBOL4, we did it all
- with one statement: OUTPUT = $INPUT. Associative programming can
- often replace a conventional linear search.
-
-
- 7.1.2 Variable Names
-
- Earlier I said that variable names were composed of letters,
- digits, and the characters period and underscore. These restric-
- tions apply only to variables which appear in program text. Var-
- iable names created or referenced with the indirect reference
- operator may be composed of ANY nonnull string of characters, and
- may be as long as any other string. If we set keyword &DUMP
- nonzero, we would see a list of states and capitals when the
- program terminated. The variable names created by $STATE are in
- the left column, and their string contents in the right column:
-
- ALABAMA = 'MONTGOMERY'
- ALASKA = 'JUNEAU'
- . . .
- NEW HAMPSHIRE = 'CONCORD'
- . . .
-
- The dump reveals a variable named NEW HAMPSHIRE, which contains
- a "blank" within its name. Clearly, you cannot directly say:
-
- NEW HAMPSHIRE = 'CONCORD'
-
- since SNOBOL4 sees this as a pattern match statement: variable
- NEW is the subject, and variable HAMPSHIRE contains the pattern.
- To reference this variable, we must use:
-
- $'NEW HAMPSHIRE' = 'CONCORD'
-
- Try CODE.SNO with some unconventional variable names:
-
-
-
-
- Tutorial - 51 - Operators and Data Types
-
-
-
-
-
- ? $'"' = 'DOUBLE QUOTE'
- ? $'$#@!*' = 53
- ? OUTPUT = $'$#@!*' $'"'
- 53DOUBLEQUOTE
-
-
- 7.1.3 Indirect GOTOs
-
- Indirect referencing is not restricted to the main body of a
- statement. It may be used in the GOTO field to transfer control
- to a label specified by a variable. Suppose variable OP held the
- one-character string '+', '-', '*', or '/'. This GOTO would
- transfer to one of four statements, labeled L+, L-, L*, or L/:
-
- statement :($('L' OP))
- L+ statement
- L- statement
- . . .
-
- The string in OP is appended to string 'L', and the result is
- used with an indirect reference to obtain the final label name.
-
- Indirect referencing in the GOTO field is a more powerful ver-
- sion of the computed GOTO which appears in some languages. It
- allows a program to quickly perform a multiway control branch
- based on an item of data. Of course, the computed label name
- must be defined in the program. SNOBOL4 provides an error mes-
- sage if your program transfers to an undefined label.
-
- Indirect referencing may not be used in a statement's label
- field. Dynamically changing the name of a statement during exe-
- cution is excessive even by SNOBOL4 standards.
-
-
- 7.2 UNEVALUATED EXPRESSIONS
-
- The pattern data type appears when a pattern structure is
- stored in a variable for subsequent use in a pattern match. For
- example, a pattern to capture the next N characters after a
- colon, and store them in variable ITEM could be written as:
-
- NPAT = ':' LEN(N) . ITEM
-
- Notice that a definition such as this is static. NPAT captures
- the value of variable N at the time of pattern construction. If
- we subsequently alter N in the program, NPAT retains N's original
- value. One way to use the current value of N is to explicitly
- specify the pattern each time it is needed:
-
- SUBJECT ':' LEN(N) . ITEM
-
- Now the pattern is being constructed anew whenever the state-
- ment is executed. But reconstructing a pattern whenever it is
- used is inefficient, so a one-time definition is preferable.
-
-
-
- Tutorial - 52 - Operators and Data Types
-
-
-
-
-
- The "unevaluated expression" operator allows us to obtain the
- efficiency of the NPAT formulation, yet use the current value of
- N when NPAT is referenced. It is a unary operator, whose graphic
- symbol is the asterisk (*). Now we would specify NPAT like this:
-
- NPAT = ':' LEN(*N) . ITEM
-
- The pattern is only constructed once, and assigned to NPAT.
- The current value of N is ignored at this time. Later, when NPAT
- is used in a pattern match, the unevaluated expression operator
- tells SNOBOL4 to fetch the current value of N.
-
- The unevaluated expression operator may be used with the argu-
- ment of the pattern functions ANY, BREAK, LEN, NOTANY, POS, RPOS,
- RTAB, SPAN, or TAB. It may also be applied to an alternate or
- subsequent clause or to an entire pattern. Here's an example:
-
- ? PAT = TAB(*I) . OUTPUT SPAN(*S) . OUTPUT
- ? SUB = '123AABBCC'
- ? I = 4
- ? S = 'AB'
- ? SUB PAT
- 123A
- ABB
- ? I = 3
- ? SUB PAT
- 123
- AABB
-
- It's worth noting that I and S were undefined when PAT was
- first constructed. Later, we will apply this technique to con-
- struct recursive patterns.
-
-
- 7.3 IMMEDIATE ASSIGNMENT
-
- Our examples have made extensive use of the conditional assign-
- ment operator to capture matched substrings after a successful
- pattern match. The "immediate assignment" operator allows us to
- capture intermediate results during the pattern match.
-
- Immediate assignment occurs whenever a subpattern matches, even
- if the entire pattern match ultimately fails. Immediate assign-
- ment is a binary operator whose graphic symbol is the dollar sign
- ($). Like conditional assignment, the matching substring on its
- left is assigned to the variable on its right. Here are examples
- with CODE.SNO where we use variable OUTPUT to reveal the work of
- the pattern matcher:
-
-
-
-
-
-
-
-
-
- Tutorial - 53 - Operators and Data Types
-
-
-
-
-
- ? S = 'ABCDEFG'
- ? S 'A' ARB $ OUTPUT 'E'
-
- B
- BC
- BCD
- Success
- ? S ('B' LEN(2) | 'C' LEN(3)) $ OUTPUT 'G'
- BCD
- CDEF
- Success
- ?
-
-
- 7.3.1 Immediate Assignment and Unevaluated Expressions
-
- As useful as immediate assignment is for revealing the inner
- workings of a pattern match, a more powerful use is possible. It
- can be used with the unevaluated expression operator to develop a
- new class of patterns. An interesting substring at the beginning
- of the subject is immediately assigned to a variable, and the
- variable is then subsequently used in the very same pattern.
-
- Suppose a number at the beginning of the subject specifies the
- length of a variable width field that follows. We would like to
- capture the number into variable N, then use it with the LEN
- function to transfer the data into variable FIELD. When used
- with LEN, N must be preceded by the unevaluated expression opera-
- tor, so that its new value is retrieved. For instance:
-
- ? FPAT = SPAN('0123456789') $ N LEN(*N) . FIELD
- ? '12ABCDEFGHIJKLMNOPQ' FPAT
- Success
- ? OUTPUT = FIELD
- ABCDEFGHIJKL
-
- SPAN matched the field length, 12, and immediately assigned it
- to N. LEN(*N) then matched the next 12 characters. Another sub-
- ject, with a different field length, would update N appropri-
- ately. Type conversion was working quietly behind the scenes
- here: N was assigned the string '12', yet it appeared as integer
- 12 to the LEN function.
-
- Now here is an example which provides a glimpse of just how
- powerful SNOBOL4's pattern matching can be. Problem: Examine a
- subject for an arbitrary three-character substring which appears
- twice in a row, or bracketed in parentheses. Solution:
-
-
-
-
-
-
-
-
-
-
- Tutorial - 54 - Operators and Data Types
-
-
-
-
-
- ? TWOPAT = LEN(3) $ X . OUTPUT *(X | "(" X ")")
- ? 'ABCDECDEFGH' TWOPAT
- CDE
- Success
- ? 'ABCDE(CDE)BA' TWOPAT
- CDE
- Success
-
- As you experiment with these types of patterns, you may dis-
- cover some which fail when they should succeed. The problem is
- that SNOBOL4 stops matching when it believes further match at-
- tempts would be futile. These "heuristics" are normally invisi-
- ble, and speed program execution. At this time, we'll defer dis-
- cussing heuristics, and simply mention that they can be disabled
- with the statement:
-
- &FULLSCAN = 1
-
- Let's take a break from pattern matching, and examine some
- other SNOBOL4 data types.
-
-
- 7.4 ARRAYS
-
-
- 7.4.1 Array Concepts
-
- Arrays in SNOBOL4 are similar to arrays in other programming
- languages. They allow a single variable name to specify more
- than one data element; integer subscripts distinguish the indi-
- vidual members of an array. Each array element may contain any
- data type, independent of the types in other array elements.
-
- A one-dimensional array is a "vector;" it is simply a list of I
- items. A two-dimensional array is a "grid" composed of several
- adjacent vectors---an I by J array has I rows and J columns. A
- three-dimensional array, I by J by K in size, is a rectangular
- solid consisting of K adjacent grids. There's no limit to the
- number of dimensions allowed, but such arrays become increasingly
- difficult to visualize.
-
- In keeping with SNOBOL4's pliability, an array is defined dur-
- ing program execution, rather than at compilation time. Its size
- and shape is specified by a string. The definition of an array
- may be changed at any time, or the array may be deleted and its
- memory reused when it is no longer needed.
-
-
- 7.4.2 Array Creation
-
- Arrays are created by the SNOBOL4 function ARRAY. A program
- calls this function with a "prototype string" which specifies the
- number of dimensions and their sizes. The function returns an
- "array pointer," which is stored in a variable; the array ele-
-
-
-
- Tutorial - 55 - Operators and Data Types
-
-
-
-
-
- ments are referenced by applying subscripts to this variable.
- Here are two statements for use with CODE.SNO. They create one-
- and two-dimensional arrays named LIST and BOX respectively:
-
- ? LIST = ARRAY('25')
- ? BOX = ARRAY('12,3')
-
- LIST points to a vector of 25 elements. BOX points to a grid,
- 12 rows high and 3 columns wide, containing 36 elements. The
- ARRAY function initializes all array elements to the null string.
-
-
- 7.4.3 Array Referencing
-
- Array subscripts are integer valued, and are specified by angu-
- lar or square brackets (<> or []). Subscript values range from 1
- to the size of each dimension. If you attempt to use a subscript
- outside this range, the array reference will fail, and the fail-
- ure may be detected in the GOTO portion of the statement. Try
- some array references with CODE.SNO:
-
- ? LIST<3> = 'MAPLE'
- ? BOX[10,2] = 3
- ? LIST[33] = 4
- Failure
- ? OUTPUT = LIST[3] LIST[4] BOX<10,2>
- MAPLE3
-
- Angular and square brackets are interchangeable. The reference
- to LIST[33] failed because the largest subscript allowed for that
- array is 25. LIST[4] produced its initialized value, the null
- string, and had no effect on the concatenation. The array
- pointer in LIST can be assigned to another variable:
-
- ? B = LIST
- ? OUTPUT = B[3]
- MAPLE
- ? B<3> = 'WILLOW'
- ? OUTPUT = LIST<3>
- WILLOW
-
- Assigning the pointer in LIST to B made both variables point to
- the same array. Since there's but one actual array, array refer-
- ences made using LIST or B are equivalent. The COPY function
- (Chapter 19) creates a duplicate copy of an entire array.
-
- Array elements may be used anywhere a variable name is
- allowed---expressions, patterns, function arguments, etc. The
- fact that an array reference fails if a subscript is out-of-
- bounds can be used in a simple and natural way when scanning an
- array. Rather than having to know an array's size, we simply
- loop until an array reference fails. A program segment to dis-
- play the members of an array SCORE might look like this:
-
-
-
-
- Tutorial - 56 - Operators and Data Types
-
-
-
-
-
- I = 0
- PRINT I = I + 1
- OUTPUT = SCORE[I] :S(PRINT)
- . . .
-
-
- 7.4.4 Array Initialization
-
- Arrays may be created with an initial value other than the null
- string. ARRAY accepts a second argument which specifies this
- initial value. We can create a three-dimensional array with all
- elements initialized to the string 'PA-18' as follows:
-
- ? A = ARRAY('2,3,4','PA-18')
- ? OUTPUT = A[1,2,3]
- PA-18
-
-
- 7.4.5 Other Array Bounds
-
- Ordinarily, subscripts range from 1 to the size of each dimen-
- sion. However, if you find it more convenient, other subscript
- ranges may be used. The prototype string for ARRAY's first
- argument has the general form:
-
- 'L1:H1,L2:H2,...,Ln:Hn'
-
- The L's and H's are integers specifying the lower and upper
- bounds of each dimension. If the lower bound and colon are
- omitted from any dimension, the integer 1 is assumed. Here is a
- five element vector, with allowed subscripts -2, -1, 0, 1 and 2:
-
- ? A = ARRAY('-2:2','PIPER')
- ? OUTPUT = A[-1]
- PIPER
- ? OUTPUT = A[3]
- Failure
-
- Arrays are a traditional computer programming concept. Now
- we'll see how SNOBOL4 takes the idea one important step further,
- with the concept of tables.
-
-
- 7.5 TABLES
-
-
- 7.5.1 Table Creation and Referencing
-
- A "table" is similar to a one-dimensional array, with two
- important differences. First, a table's size is not fixed; it
- extends itself automatically whenever a new element is added to
- it. Second, table subscripts are not limited to integers, but
- may be any SNOBOL4 data type. Strings and patterns may be used
- as subscripts. Tables combine the idea of associative program-
-
-
-
- Tutorial - 57 - Operators and Data Types
-
-
-
-
-
- ming with the data grouping of arrays.
-
- Tables are created by the SNOBOL4 function TABLE. No arguments
- are required, since a table's size is not fixed. The function
- returns a table pointer, which you store in a variable. Like
- arrays, table elements are referenced by applying subscripts to
- the variable. Try this example with CODE.SNO:
-
- ? T = TABLE()
- ? T['ROSE'] = 'RED'
- ? T['N'] = 6
- ? OUTPUT = T['N'] T['THE'] T['ROSE']
- 6RED
- ? FLOWER = 'ROSE'
- ? T[FLOWER] = T[FLOWER] ',THORNS'
- ? OUTPUT = T[FLOWER]
- RED,THORNS
-
- Here, strings have been used as table subscripts. The concept
- of an "out-of-bounds" subscript does not exist with tables. The
- reference to T['THE'] created a new entry, and assigned it the
- null string. Unlike arrays, no initial value for new entries may
- be specified in the call to TABLE; new table entries are always
- initialized to the null string.
-
-
- 7.5.2 Conversion between Tables and Arrays
-
- In the above example, we know what values were used as table
- subscripts. But if the table were constructed from data in a
- file, how can we determine what items were placed in the table?
- We need to know the subscripts to view the table, but the sub-
- scripts themselves are part of the table. If this were an array,
- we could run an integer subscript over the array to see the data.
- Applying integer subscripts to a table only creates more entries.
-
- SNOBOL4 provides a simple solution to this dilemma---a method
- to convert a table to an array. An N row by 2 column array can
- be created from a table. The first array column contains the
- subscripts that were used to create the table. The second column
- contains the data items stored with the corresponding table sub-
- script. N is the number of table entries with nonnull values.
-
- Once the table is in array form, integer subscripts can be
- applied to the array to display the subscripts and their values.
- A table is converted to an array with the CONVERT function, which
- accepts a table argument and the word 'ARRAY', and returns a
- pointer to the new array. Continuing with the previous example:
-
-
-
-
-
-
-
-
-
- Tutorial - 58 - Operators and Data Types
-
-
-
-
-
- ? A = CONVERT(T, 'ARRAY')
- Success
- ? OUTPUT = A[1,1] '-' A[1,2]
- ROSE-RED,THORNS
- ? OUTPUT = A[2,1] '-' A[2,2]
- N-6
-
- As you would expect with SNOBOL4, the inverse operation---con-
- version of an array to a table---is also possible. The array
- must be rectangular, N rows by 2 columns. The array entries in
- the first column become the table subscripts. The array's second
- column becomes the table entry values:
-
- ? W = CONVERT(A, 'TABLE')
- Success
- ? OUTPUT = W['ROSE']
- RED,THORNS
-
-
- 7.5.3 Counting Word Usage with a Table
-
- Tables are useful when we want to record a number of pair asso-
- ciations, where each half of the pair might have any data type.
- A classic example of a table's utility is a word usage program.
- Earlier, we developed a program to count the total number of
- words in a file. We will modify that program to count the number
- of times each unique word appears. The program begins like this:
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Tutorial - 59 - Operators and Data Types
-
-
-
-
-
- * Simple word usage program, WORDU.SNO.
- *
- * A word is defined to be a contiguous run of letters,
- * digits, apostrophe and hyphen. This definition of legal
- * letters in a word can be altered for specialized text.
- *
- * If the file to be counted is TEXT.IN, run as follows:
- * B>SNOBOL4 WORDU /I=TEXT
- *
- &TRIM = 1
-
- * Define the characters which comprise a 'word'
- WORD = "'-" '0123456789' &LCASE
-
- * Pattern to isolate each word as assign it to ITEM:
- WPAT = BREAK(WORD) SPAN(WORD) . ITEM
-
- * Create a table to maintain the word counts
- WCOUNT = TABLE()
-
- * Read a line of input and obtain the next word
- NEXTL LINE = REPLACE(INPUT, &UCASE, &LCASE) :F(DONE)
- NEXTW LINE WPAT = :F(NEXTL)
-
- * Use word as subscript, update its usage count
- WCOUNT<ITEM> = WCOUNT<ITEM> + 1 :(NEXTW)
- DONE . . .
-
- We'll convert the input to lower case, so words like 'The' and
- 'the' are counted together. WPAT has been changed to store each
- word in variable ITEM. When a word is identified, it is used as
- a subscript for table WCOUNT. When ITEM contains a new word, the
- first reference to WCOUNT<ITEM> creates a new table entry and
- returns the null string. Integer 1 is added to the null string,
- and the result, 1, is stored back into WCOUNT<ITEM>. If the same
- word is encountered again, WCOUNT<ITEM> for that word will be
- incremented to 2.
-
- The program reads the input file, building a table with entries
- for each unique word. When End-of-File is read, control trans-
- fers to label DONE, where we display the words and their respec-
- tive counts. We convert WCOUNT to an array, and use integer
- subscripts to retrieve the words and their counts. Conversion
- fails if the table is empty. Continuing with this program:
-
-
-
-
-
-
-
-
-
-
-
-
-
- Tutorial - 60 - Operators and Data Types
-
-
-
-
-
- * Convert table to array. Fail if table is empty
- DONE A = CONVERT(WCOUNT, 'ARRAY') :F(EMPTY)
-
- * Scan array, printing words and counts
- I = 0
- PRINT I = I + 1
- OUTPUT = A<I,1> '--' A<I,2> :S(PRINT) F(END)
-
- EMPTY OUTPUT = 'No words'
- END
-
- The table subscripts were the file's words, and have been
- placed in the first column of the array, A<I,1>. The count for
- each word was the table entry, now in the second column, A<I,2>.
- Tables are very convenient for recording information about data
- items, while conversion to an array makes it easy to systemati-
- cally examine the recorded information.
-
-
- 7.6 THE NAME OPERATOR
-
- The unary name operator provides the address or location in
- memory where a variable is stored. Its graphic symbol is the
- period (.). We'll introduce it here through an example.
-
- Consider the indirect reference operator mentioned earlier.
- Suppose we want to use a variable to point to different elements
- of an array or table. If we try the following, we immediately
- discover a problem:
-
- ? A = ARRAY('10,10')
- ? A[4,2] = 'DOG'
- ? V = 'A[4,2]'
- ? OUTPUT = $V
-
- Success
-
- The indirect reference operator treats the string 'A[4,2]' as a
- variable name, rather than an array element. Remember, any char-
- acter sequence can be used indirectly to create a variable.
- SNOBOL4 creates a variable called A[4,2] that has absolutely no
- connection with array A. The fact that this character sequence
- happens to look like an array reference to us is purely coinci-
- dental from SNOBOL4's point of view.
-
- To make this work, the name operator is applied to A[4,2] to
- obtain the address of that array element. The address can be
- stored in variable V, and referenced with the indirect operator:
-
- ? V = .A[4,2]
- ? OUTPUT = $V
- DOG
-
- The name operator provides a general method for specifying the
-
-
-
- Tutorial - 61 - Operators and Data Types
-
-
-
-
-
- name of an object. Both of these statements are correct for
- specifying the first argument to the INPUT function:
-
- INPUT('INFILE', 1, , 'CAPITAL.DAT')
- INPUT(.INFILE, 1, , 'CAPITAL.DAT')
-
- Either form, 'INFILE' or .INFILE, tells the INPUT function the
- name of the variable to be input associated. However, using the
- name operator allows us to associate a file with an array or
- table element:
-
- INPUT('A[4,2]', 1, , 'CAPITAL.DAT') (incorrect)
- INPUT(.A[4,2], 1, , 'CAPITAL.DAT')
-
- Note that alternate use of the indirect reference and name
- operators "cancel" one another, so
-
- ? OUTPUT = $(.($(.A[4,2])))
- DOG
-
- is simply a reference to A[4,2].
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Tutorial - 62 - Operators and Data Types
-
-
-
-
-
- Chapter 8
-
-
- PROGRAM-DEFINED OBJECTS
- -----------------------------------------------------------------
-
- SNOBOL4 is a very large and rich language, providing a diverse
- assortment of built-in features. It is also an extensible lan-
- guage; it allows you to define new data types, functions, and
- operators. You can, by creating your own entities, obtain
- another level of conciseness and power of expression.
-
- We will begin with program-defined functions because they allow
- a program to be partitioned into smaller, more manageable seg-
- ments. As functions tend to be just a few lines long, transfers
- of control within them are usually obvious and manageable. If
- your main program has complex, intertwined GOTOs, consider how
- the use of functions would clarify things.
-
- Functions also allow us to postpone the complete development of
- an algorithm. We can design the overall program structure, using
- function names for components which will be developed later.
- Furthermore, if a particular function proves inefficient, it can
- be replaced later with an improved version.
-
-
- 8.1 PROGRAM-DEFINED FUNCTIONS
-
- The concept of a function should be clear from all the examples
- of SNOBOL4's built-in functions. A function accepts some number
- of arguments, performs a computation based on their values, and
- returns a result and a success signal. A function can also sig-
- nal failure, and not return any value.
-
-
- 8.1.1 Function Definition
-
- We can define a new function by specifying its name and argu-
- ments. The definition will be composed of "dummy arguments"---
- place holders that show how the arguments are to be used in the
- function. Later, when the function is called, the actual argu-
- ments will replace the dummy arguments in the computation.
-
- We define a new function in SNOBOL4 by using the built-in func-
- tion DEFINE. We call it with a "prototype string" containing the
- new function's name and arguments. DEFINE makes the new func-
- tion's name known to SNOBOL4, so it can be used subsequently.
-
- Suppose we want to create a new function called SHIFT, which
- would circularly rotate a string through a specified number of
- character positions. We'll define all rotations as being to the
- left---characters removed from the front of the string are placed
- back on the end. For example, SHIFT('ENGRAVING',3) would return
- the string 'RAVINGENG'.
-
-
-
- Tutorial - 63 - Program-Defined Objects
-
-
-
-
-
- We will begin by defining the function name and its dummy argu-
- ments, S and N. Any names of your choosing can by used for dummy
- arguments. In a program, it would look like this:
-
- DEFINE('SHIFT(S,N)')
-
- It is important to realize that the DEFINE function must be
- executed for the definition to occur. Most other programming
- languages process function definitions when a program is com-
- piled. SNOBOL4's system is more flexible; the prototype string
- can itself be the result of other run-time computations. In an
- extreme case, data input to a program could determine the names
- and kinds of functions to be defined.
-
-
- 8.1.2 The Function Body
-
- Having declared the function name and dummy arguments, we need
- to provide the statements which will implement the function. A
- very simple convention applies:
-
- When the function is used, SNOBOL4 transfers control to
- a statement label with the same name as the function.
-
- In this case, the first statement of the function would be
- labeled SHIFT. There is no limit to the number of statements
- comprising the function body.
-
-
- 8.1.3 Returning Function Results
-
- First, a function may return a value by assigning it to a vari-
- able with the same name as the function. If no assignment
- occurs, the result is the null string.
-
- Second, the function must tell SNOBOL4 that it is finished, and
- that control should return back to the caller. It does this by
- transferring to the special label RETURN.
-
- The label RETURN should not appear anywhere in your program.
- It is a special name, reserved by SNOBOL4 for just this purpose.
-
- With this information, we can now write our SHIFT function. We
- will remove the first N characters from the beginning of the ar-
- gument string, and place them on the end. The function body
- looks like this:
-
- SHIFT S LEN(N) . FRONT REM . REST
- SHIFT = REST FRONT :(RETURN)
-
- Each time SHIFT is called, the particular arguments used are
- placed in S and N. The first statement splits S into two parts,
- assigning them to variables FRONT and REST. The second statement
- reassembles them in the shifted order, and assigns them to vari-
-
-
-
- Tutorial - 64 - Program-Defined Objects
-
-
-
-
-
- able SHIFT, to be returned as the function result. The GOTO then
- transfers to label RETURN to return back to the caller.
-
-
- 8.1.4 Function Failure
-
- What happens if we try the function call SHIFT('PEAR',7)? As
- the function is defined above, the pattern match would fail,
- since LEN(7) is longer than the subject string. The assignment
- to FRONT and REST would not take place, and the function would
- return an erroneous result.
-
- Now we could extend the definition of SHIFT to cycle the
- argument string multiple times. In general, though, we want to
- develop a convenient method that allows a function to signal an
- exceptional condition back to the caller. Function failure
- allows us to do just that. Another convention is provided:
-
- Transferring to the special label FRETURN returns from
- a function signaling failure to the caller. No value
- is returned as the function result.
-
- We can now rework the function body to signal failure when N is
- too large. In this case, the pattern match fails, and we detect
- the failure in the GOTO field:
-
- SHIFT S LEN(N) . FRONT REM . REST :F(FRETURN)
- SHIFT = REST FRONT :(RETURN)
-
- In general, the transfer to FRETURN does not need to be the
- result of the failure of a particular statement. Any success or
- failure could be tested to produce a transfer to FRETURN. For
- example, if we decided to explicitly test the length of S, the
- function could begin with:
-
- SHIFT GT(N, SIZE(S)) :S(FRETURN)
- . . .
-
-
- 8.1.5 Local Variables
-
- FRONT and REST were used in this function as temporary vari-
- ables to rearrange the argument string. If they had appeared
- elsewhere in your program, their old values would be destroyed.
- Such inadvertent conflicts become harder to avoid as your func-
- tion library grows. The prototype string used with DEFINE can
- specify "local variables" to be protected when the function is
- called. For our SHIFT function, the call would look like this:
-
- DEFINE('SHIFT(S,N)FRONT,REST')
-
- The local variables appear after the argument list. When SHIFT
- is called, any existing values for FRONT and REST will be saved
- on a pushdown stack. FRONT and REST are set to the null string,
-
-
-
- Tutorial - 65 - Program-Defined Objects
-
-
-
-
-
- and control is transferred to the first statement of the function
- body. When the function returns, FRONT and REST are restored to
- their previous values.
-
- Since the same potential problem exists for dummy arguments S
- and N, SNOBOL4 automatically saves their values before assigning
- the actual arguments to them. And just like local variables,
- when the function returns, the dummy arguments are restored to
- their original values.
-
-
- 8.1.6 Using Functions
-
- Once a function has been defined, it may be used in exactly the
- same manner as a built-in function. It may appear in a statement
- anywhere its value is needed---in the subject, pattern, or
- replacement fields. If used with the indirect reference opera-
- tion, functions may even be used in the GOTO field. Of course, a
- function may be used as the argument of another function.
-
- The value returned by a function is not restricted to strings.
- Any SNOBOL4 data type, including patterns, may be returned. Ear-
- lier, in the pattern match chapter, we showed how simple patterns
- could be tailored to our needs by using them in more complicated
- clauses. The specific example was a variation of the BREAK pat-
- tern which would not match the null string. Let's use a program-
- defined function to create a new function, BREAK1, with this
- property. The definition statement might look like this:
-
- DEFINE('BREAK1(S)')
-
- and the function body, like this:
-
- BREAK1 BREAK1 = NOTANY(S) BREAK(S) :(RETURN)
-
- This function can now be used directly in a pattern match. For
- example, BREAK1('abc') constructs a pattern which matches a non-
- null string, up to the occurrence of the letters 'a', 'b', or
- 'c'. Of course, the pattern returned by a function can be as
- complex as desired, giving us an elegant method to define our own
- pattern matching primitives.
-
-
- 8.1.7 Organizing Functions
-
- SNOBOL4 does not know or care which statements belong to a
- particular function. There is no explicit END statement for
- individual functions. To keep programs readable, we'll have to
- impose some discipline of our own. Also, having to execute the
- DEFINE function is a mixed blessing. It offers tremendous flexi-
- bility, but requires us to place all our DEFINE's at the begin-
- ning of a program. Here is the system proposed by Gimpel, which
- I like to use to manage functions and their definitions:
-
-
-
-
- Tutorial - 66 - Program-Defined Objects
-
-
-
-
-
- We keep the function definition, any one-time initialization,
- and the function body together as a unit. A GOTO transfers con-
- trol around the function body after the definition and initial-
- ization statements are executed. Also present are comments de-
- scribing its use and any exceptional conditions. Rewriting the
- SHIFT function in this form, and taking this opportunity to avoid
- rebuilding the pattern each time the function is called, it looks
- like this:
-
- * SHIFT(S,N) - Shift string S left N character positions.
- * As characters are removed from the left side of the
- * string, they are placed on the end.
- *
- * The function fails if N is larger than the size of S.
-
- DEFINE('SHIFT(S,N)FRONT,REST')
- SHIFT_PAT = LEN(*N) . FRONT REM . REST :(SHIFT_END)
-
- SHIFT S SHIFT_PAT :F(FRETURN)
- SHIFT = REST FRONT :(RETURN)
- SHIFT_END
-
- Now this group of lines can be incorporated as a unit into the
- beginning of any program that wants to use it. When execution
- begins, the first statement defines the SHIFT function. Next we
- define a pattern, called SHIFT_PAT, for use when the function is
- called. The pattern definition is only executed once, so we use
- the unevaluated expression operator (*N) to obtain the current
- value of N on each function call. After defining the pattern, we
- "jump around" the function body, to label SHIFT_END. (Remember,
- we are defining the function now, not executing it; falling into
- the function body would be an error.) The function is now de-
- fined, and ready to be used.
-
- In general, functions should be prepared in this form:
-
- * Fname - Description of use
-
- DEFINE('Fname(arg1,...,argn)local1,...,localn')
- . . .
- * Any one-time initialization for Fname
- . . . :(Fname_END)
-
- Fname Function body
- . . .
- Fname_END
-
- If you place your functions in individual disk files, they can
- be included in new programs as necessary. By preparing functions
- in this form, they will all be defined and initialized when exe-
- cution begins.
-
- When discussing pattern matching, we used a pattern to convert
- a character to its ASCII decimal value. In BASIC, two functions
-
-
-
- Tutorial - 67 - Program-Defined Objects
-
-
-
-
-
- are provided for similar operations: ASC and CHR$. We can create
- SNOBOL4 equivalents like this:
-
- * ASC(S) - Return the ASCII code for the first character of
- * string S.
- *
- * The value returned is an integer between 0 and 255.
- * The function fails if S is null.
-
- DEFINE('ASC(S)C')
- ASC_ONE = LEN(1) . C
- ASC_PAT = BREAK(*C) @ASC :(ASC_END)
-
- ASC S ASC_ONE :F(FRETURN)
- &ALPHABET ASC_PAT :(RETURN)
- ASC_END
-
- * CHR(N) - Converts an integer ASCII code to a one character
- * string.
- *
- * The argument N is an integer between 0 and 255.
- * The function fails if N is greater than 255.
-
- DEFINE('CHR(N)')
- CHR_PAT = TAB(*N) LEN(1) . CHR :(CHR_END)
-
- CHR &ALPHABET CHR_PAT :S(RETURN) F(FRETURN)
- CHR_END
-
- Note that both functions were written to work correctly regard-
- less of the anchoring mode in use by the calling program.
-
- (The CHR function is shown here as an example only. Vanilla
- SNOBOL4 provides a built-in function, CHAR(N), for this purpose.
- See Chapter 19, "Built-in Functions.")
-
-
- 8.1.8 Call by Value and Call by Name
-
- Function calls in SNOBOL4 transmit the "value" of the argument
- to the function. Variables used in the function call cannot be
- harmed by the function. This type of function usage is referred
- to as "call by value." Occasionally, we might want the function
- to access the argument variables themselves. The name operator
- introduced in the previous chapter provides this ability. The
- function call still transmits a value, but the value used is the
- "name" of a variable.
-
- Consider a function called SWAP, which will exchange the con-
- tents of two variables. If we wanted to exchange the contents of
- variables COUNT and OLDCOUNT, we would say SWAP(.COUNT,
- .OLDCOUNT). The function looks like this:
-
-
-
-
-
- Tutorial - 68 - Program-Defined Objects
-
-
-
-
-
- * SWAP(.V1, .V2) - Exchange the contents of two variables.
- * The variables must be prefixed with the name operator
- * when the function is called.
-
- DEFINE('SWAP(X,Y)TEMP') :(SWAP_END)
-
- SWAP TEMP = $X
- $X = $Y
- $Y = TEMP :(RETURN)
- SWAP_END
-
- The name operator allows us to access the argument variables.
- If we had not used it, the function would be called with the
- variables' values, with no indication of where they came from.
- Calls to SWAP are not limited to simple variable arguments. Any-
- thing capable of receiving the name operator, such as array and
- table elements, could be used: SWAP(.A<4,3>, .T<'YOU'>).
-
- There are certain situations where call by name occurs implic-
- itly. If the argument is an array or table name, or a program-
- defined data type (discussed below), it points to the actual data
- object, which can then be modified by the function. For example,
- if FILL were a function which loads an array with values read
- from a file, the statements
-
- A = ARRAY(25)
- FILL(A)
-
- would cause array A to be altered.
-
-
- 8.1.9 Functions and CODE.SNO
-
- The CODE.SNO program was provided to allow interactive experi-
- ments with SNOBOL4 statements. If you create functions using the
- preceding format, they also can be tested using CODE.SNO.
-
- Use your text editor to create a disk file containing the SHIFT
- function. (Be certain to include the GOTO that transfers around
- the function body.) Call the file SHIFT.SNO. Now, start the
- CODE.SNO program, and type the following:
-
- ? SLOAD('SHIFT.SNO')
- Success
- ? OUTPUT = SHIFT('COTTON',4)
- ONCOTT
- ? OUTPUT = SHIFT('OAK',4)
- Failure
-
-
-
-
-
-
-
-
-
- Tutorial - 69 - Program-Defined Objects
-
-
-
-
-
- 8.1.10 Recursive Functions
-
- The statements that comprise a function are free to call any
- functions they choose, including the function they are defining.
- Of course, for this to make sense, they must call themselves with
- a simplified version of the original problem, or an endless loop
- would result. Eventually, the function calls itself with an arg-
- ument so simple that it can return an answer without any further
- recursive calls. It's like winding a clock spring up. The
- central, non-recursive answer to the innermost call provides an
- answer to the next turn out, with the recursive calls unwinding
- until the original problem can be solved.
-
- There is no explicit declaration for recursion; any function
- can be used recursively if it is designed properly. However, all
- local variables should be declared in the DEFINE function so they
- will be saved and restored during recursive calls.
-
- Sometimes, recursion can produce dramatically smaller programs.
- "Algorithms in SNOBOL4" provides a example with the recursive
- function, ROMAN. It convert's an integer in the range 0 to 3999
- to its Roman numeral equivalent. Two premises are required:
-
- 1. We know the Roman numerals for the numbers 0 to 9 (null, I,
- II, ..., IX), and can perform this conversion with a simple
- pattern match.
-
- 2. We can use the REPLACE function to "multiply" a number in
- Roman form by 10 by replacing I by X, V by L, X by C, etc.
-
- The function uses these two rules to produce a recursive solu-
- tion for some integer N. The algorithm looks like this:
-
- The rightmost digit is removed from the argument and
- converted by premise 1. Removing the digit effectively
- divides the argument by 10, simplifying the problem.
-
- The reduced argument is then converted by calling ROMAN
- recursively and "multiplying" the result by 10 accord-
- ing to premise 2.
-
- The previously converted unit's digit is appended to
- the result.
-
- Here's the function (note that a "plus sign" in column one allows
- a statement to be continued over several lines):
-
-
-
-
-
-
-
-
-
-
-
- Tutorial - 70 - Program-Defined Objects
-
-
-
-
-
- * ROMAN(N) - Convert integer N to Roman numeral form.
- *
- * N must be positive and less than 4000.
- *
- * An asterisk appears in the result if N >= 4000.
- *
- * The function fails if N is not an integer.
-
- DEFINE('ROMAN(N)UNITS') :(ROMAN_END)
-
- * Get rightmost digit to UNITS and remove it from N.
- * Return null result if argument is null.
- ROMAN N RPOS(1) LEN(1) . UNITS = :F(RETURN)
-
- * Search for digit, replace with its Roman form.
- * Return failing if not a digit.
- '0,1I,2II,3III,4IV,5V,6VI,7VII,8VIII,9IX,' UNITS
- + BREAK(',') . UNITS :F(FRETURN)
-
- * Convert rest of N and multiply by 10. Propagate a
- * failure return from recursive call back to caller.
- ROMAN = REPLACE(ROMAN(N), 'IVXLCDM', 'XLCDM**')
- + UNITS :S(RETURN) F(FRETURN)
- ROMAN_END
-
- The first call to ROMAN may have an integer argument. The
- statement labeled ROMAN causes N to be converted to a string, and
- subsequent recursive calls use a string argument. The recursive
- calls cease when reducing N finally produces a null string
- argument---the match at statement ROMAN fails, and the function
- returns immediately with a null result.
-
-
- 8.2 PROGRAM-DEFINED DATA TYPES
-
- With the exception of arrays and tables, a variable may have
- only one item of data in it at a time. In many applications, it
- is convenient if several data items can be associated with a
- variable. For example, if we wanted to work with complex num-
- bers, a variable should contain two numbers---the real and imagi-
- nary parts. In an inventory system, an individual product might
- require values such as name, price, quantity, and manufacturer.
-
- Program-defined data types enlarge SNOBOL4's repertoire to
- include new objects such as COMPLEX or PRODUCT. SNOBOL4 only
- provides a system for managing these new types; defining a data
- type does not magically invest SNOBOL4 with a knowledge of com-
- plex arithmetic or inventory accounting. It is still up to you
- to provide the computational support for each new type.
-
-
- 8.2.1 Data Type Definition
-
- A program-defined data type will consist of a number of fields,
-
-
-
- Tutorial - 71 - Program-Defined Objects
-
-
-
-
-
- each containing an individual data element. We begin by select-
- ing names for the data type and fields. An inventory system
- might use the data type name PRODUCT, and field names NAME,
- PRICE, QUANTITY, and MFG.
-
- A data type is defined by providing a prototype string to the
- built-in DATA function. The prototype assumes a form similar to
- a function call, with the data type taking the place of the func-
- tion name, and the field names replacing the arguments. The form
- of the prototype string is:
-
- 'TYPENAME(FIELD1,FIELD2,...,FIELDn)'
-
- Blanks are not permitted within a prototype. Try creating a
- new data type using the CODE.SNO program:
-
- ? DATA('PRODUCT(NAME,PRICE,QUANTITY,MFG)')
- Success
-
- The DATA function tells SNOBOL4 to define an object creation
- function with the new data type's name:
-
- PRODUCT(arg1, arg2, arg3, arg4)
-
- This new function can be called whenever we wish to create a
- new object with the PRODUCT data type. Its arguments are the
- initial values to be given to the four fields which comprise a
- PRODUCT. The function returns a pointer to the new object, which
- can be stored in a variable, array, or table. Try creating two
- new objects as follows:
-
- ? ITEM1 = PRODUCT('CAPERS', 2, 48, 'BRINE BROTHERS')
- ? ITEM2 = PRODUCT('PICKLES', 1, 72, 'PETER PIPER INC.')
-
-
- 8.2.2 Data Type Use
-
- The defining call to the DATA function also created several
- field reference functions. In this case, their names would be:
-
- NAME(arg) PRICE(arg) QUANTITY(arg) MFG(arg)
-
- The argument used with each function is an object created by
- the PRODUCT function. Try accessing ITEM1's fields:
-
- ? OUTPUT = MFG(ITEM1)
- BRINE BROTHERS
- ? OUTPUT = PRICE(ITEM1) * QUANTITY(ITEM1)
- 96
-
- We can alter the value of a field after an object is created.
- Field reference functions can also be used as the object of an
- assignment, so:
-
-
-
-
- Tutorial - 72 - Program-Defined Objects
-
-
-
-
-
- ? QUANTITY(ITEM2) = QUANTITY(ITEM2) - 12
-
- changes the QUANTITY field of ITEM2 from 72 to 60.
-
-
- 8.2.3 Copying Data Items
-
- It is important to recognize that variables like ITEM1 and
- ITEM2 contain "pointers" to the data. Assigning ITEM1 to another
- variable, say LASTITEM, merely copies the pointer; both variables
- still point to the same physical packet of data in memory.
- Altering the QUANTITY field of ITEM1 would alter the QUANTITY
- field of LASTITEM. This is the same behavior observed earlier
- for array and table names.
-
- The built-in COPY function creates a unique copy of an object--
- one which is independent of the original. Try using it with
- CODE.SNO:
-
- ? LASTITEM = COPY(ITEM1)
- ? QUANTITY(ITEM1) = 24
- ? OUTPUT = QUANTITY(LASTITEM)
- 48
-
-
- 8.2.4 Creating Structures
-
- Our inventory example used string and integer values as the
- field contents. In fact, any SNOBOL4 data type may be stored in
- a field, including pointers to other program-defined types. Com-
- plex structures, such as queues, stacks, trees, and arbitrary
- graphs may be created.
-
- For example, if we wanted to link together all products made by
- the same manufacturer, PRODUCT could be defined with an addi-
- tional field. We won't go through the exercise with CODE.SNO,
- but will sketch out the changes:
-
- DATA('PRODUCT(NAME,PRICE,QUANTITY,MFG,MFGLINK')
-
- As each product is defined, we will determine if we have
- another product from the same manufacturer. If so, MFGLINK is
- set to point to that other product. If not, it is set to the
- null string. A table M provides a convenient way to keep track
- of manufacturers. Assume variable COMPANY contains the manufac-
- turer's name as each product is defined. Then all of the requi-
- site searching and linking can be accomplished in one statement:
-
- M<COMPANY> = PRODUCT(..., ..., ..., COMPANY, M<COMPANY>)
-
- If this is the company's first appearance, it is not in the
- table, and the last argument to the PRODUCT function sets MFGLINK
- to the null string. The assignment statement uses the company as
- the table subscript, and the entry points to the current product.
-
-
-
- Tutorial - 73 - Program-Defined Objects
-
-
-
-
-
- If another product definition uses the same company, MFGLINK will
- point to the previous product, and the table will be updated to
- point to the current product. In this manner, all products from
- a manufacturer will be threaded together. Each thread starts
- with a table entry, and goes through each product's MFGLINK
- field, ending with a null string in the last product's MFGLINK.
-
- Now if we wanted to display all products supplied by a particu-
- lar manufacturer, we select and follow the appropriate thread:
-
- X = M<COMPANY>
- LOOP OUTPUT = DIFFER(X) NAME(X) :F(DONE)
- X = MFGLINK(X) :(LOOP)
- DONE
-
-
- 8.2.5 The DATATYPE Function
-
- The DATATYPE function allows you to learn the type of data in a
- particular variable. It is useful when the kind of processing to
- be performed depends on the data type. The formal data type name
- is returned as an upper-case string:
-
- ? OUTPUT = DATATYPE(54)
- INTEGER
- ? OUTPUT = DATATYPE(ITEM1)
- PRODUCT
-
-
- 8.3 PROGRAM-DEFINED OPERATORS
-
- If you can define new functions and data types, why not new
- operators too? Indeed, SNOBOL4 provides this feature, although
- most programs can be written without it. For the sake of com-
- pleteness, we'll provide a brief discussion.
-
-
- 8.3.1 Operators and Functions
-
- Unary or binary operators can be thought of as functions of one
- or two arguments. For example, A + B can be written in func-
- tional form as PLUS(A,B), where PLUS is some function which im-
- plements addition. Operators can be redefined by specifying a
- function to replace them. We still write our program in terms of
- the operator's graphic symbol, but SNOBOL4 will use the specified
- function whenever the operator must be performed.
-
-
-
-
-
-
-
-
-
-
-
- Tutorial - 74 - Program-Defined Objects
-
-
-
-
-
- The built-in function OPSYN creates synonyms and new defini-
- tions for operators. Synonyms permit different names or symbols
- to be used in place of a function or operator. The general form
- of OPSYN is:
-
- OPSYN(new name, old name, i)
-
- The new name is defined as a synonym of the old name. The
- third argument is 0, 1, or 2 if we are defining functions, unary
- operators, or binary operators respectively.
-
-
- 8.3.2 Function Synonyms
-
- We can make the name LENGTH a synonym for the SIZE function:
-
- ? OPSYN('LENGTH', 'SIZE', 0)
- ? OUTPUT = LENGTH('RABBIT')
- 6
-
- The word synonym is not quite an accurate description of OPSYN.
- The name LENGTH becomes associated with the "code" that imple-
- ments the SIZE function, not with the word SIZE per se. If SIZE
- was subsequently redefined---perhaps as a program-defined
- function--LENGTH would continue to return the number of
- characters in a string.
-
-
- 8.3.3 Operator Synonyms
-
- Take a moment to examine the tables in Chapter 15, "Operators,"
- in the reference section. Note that in each table there are a
- number of operator symbols whose definition is <none>.
-
- If you use an undefined binary operator, you'll get an error:
-
- ? OUTPUT = 1 # 1
- Execution error #5, Undefined function or operation
-
- However, we could make this operator synonymous with the DIFFER
- function (which also uses two arguments) and use it instead:
-
- ? OPSYN('#', 'DIFFER', 2)
- ? OUTPUT = 1 # 2
- Failure
-
- Conversely, we can define a function in place of an operator:
-
- ? OPSYN('PLUS', '+', 2)
- ? OUTPUT = PLUS(4, 5)
- 9
-
-
-
-
-
-
- Tutorial - 75 - Program-Defined Objects
-
-
-
-
-
- Unary operators can be similarly treated, using 1 as the third
- argument:
-
- ? OPSYN('!', 'ANY', 1)
- ? 'ABC321' !'3C' . OUTPUT
- C
-
- Operators can be created to maintain a stack, or navigate
- around a tree. The full generality of functions and program-
- defined data types are available to your operators. Through this
- technique you can make SNOBOL4 speak the language of your
- particular problem.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Tutorial - 76 - Program-Defined Objects
-
-
-
-
-
- Chapter 9
-
-
- ADVANCED TOPICS
- -----------------------------------------------------------------
-
- The material presented so far allows you to write powerful
- SNOBOL4 programs. In this chapter, we will examine other inter-
- esting and useful features of the language.
-
-
- 9.1 THE ARBNO FUNCTION
-
- This function produces a pattern which will match zero or more
- consecutive occurrences of the pattern specified by its argument.
- As its name implies, ARBNO is useful when an arbitrary number of
- instances of a pattern may occur. For example, ARBNO(LEN(3))
- matches strings of length 0, 3, 6, 9, .... There is no restric-
- tion on the complexity of the pattern argument.
-
- Like the ARB pattern, ARBNO is shy, and tries to match the
- shortest possible string. Initially, it simply matches the null
- string. If a subsequent pattern component fails to match,
- SNOBOL4 backs up, and asks ARBNO to try again. Each time ARBNO
- is retried, it supplies another instance of its argument pattern.
- In other words, ARBNO(PAT) behaves like
-
- ( '' | PAT | PAT PAT | PAT PAT PAT | ... )
-
- Also like ARB, ARBNO is usually used with adjacent patterns to
- "draw it out." Let's consider a simple example. We want to
- write a pattern to test for a list. We'll define a list as being
- one or more numbers separated by comma, and enclosed by parenthe-
- ses. Use CODE.SNO to try this definition:
-
- ? ITEM = SPAN('0123456789')
- ? LIST = POS(0) '(' ITEM ARBNO(',' ITEM) ')' RPOS(0)
- ? '(12,345,6)' LIST
- Success
- ? '(12,,34)' LIST
- Failure
-
- ARBNO is retried and extended until its subsequent, ')', fi-
- nally matches. POS(0) and RPOS(0) force the pattern to be ap-
- plied to the entire subject string.
-
- Alternation may be used within ARBNO's argument. This pattern
- matches any number of pairs of certain letters:
-
- ? PAIRS = POS(0) ARBNO('AA' | 'BB' | 'CC') RPOS(0)
- ? 'CCBBAAAACC' PAIRS
- Success
- ? 'AABBB' PAIRS
- Failure
-
-
-
- Tutorial - 77 - Advanced Topics
-
-
-
-
-
- 9.2 RECURSIVE PATTERNS
-
- This is the pattern analogue of a recursive function---a pat-
- tern is defined in terms of itself. The unevaluated expression
- operator makes the definition possible.
-
- Suppose we wanted to expand the previous definition of a list
- to say that a list item may be a span of digits, or another list.
- The definition proceeds as before, except that the unevaluated
- expression operator is used in the first statement; the concept
- of a list has not yet been defined:
-
- ? ITEM = SPAN('0123456789') | *LIST
- ? LIST = '(' ITEM ARBNO(',' ITEM) ')'
- ? TEST = POS(0) LIST RPOS(0)
- ? '(12,(3,45,(6)),78)' TEST
- Success
- ? '(12,(34)' TEST
- Failure
-
- Recursion occurs because LIST is defined in terms of ITEM,
- which is defined in terms of LIST, and so on. Note that func-
- tions POS(0) and RPOS(0) were "moved out one level," to TEST, be-
- cause LIST must now match substrings within the subject.
-
- In our previous discussion of recursive functions, we said they
- work because successive calls present the function with progres-
- sively simpler problems, until the problem can be solved without
- further recursion. Similarly, patterns ITEM and LIST are applied
- to successively smaller substrings, until ITEM can use its SPAN()
- alternative instead of invoking LIST again.
-
- In general, you will need an alternative somewhere in the re-
- cursive loop to allow the pattern matcher "a way out." Also, you
- should place recursive objects last in a series of alternatives,
- so that the simpler, nonrecursive patterns are attempted first
- and "recursive plunges" can be avoided.
-
- SNOBOL4 saves information on a "pattern stack" during the pat-
- tern match process. Heavily recursive patterns and long subject
- strings can sometimes result in stack overflow. If this occurs,
- you should break the problem apart into several smaller pattern
- matches.
-
- As recursive patterns use the unevaluated expression operator,
- it is sometimes necessary to disable SNOBOL4's heuristics by
- setting &FULLSCAN = 1.
-
-
- 9.3 QUICKSCAN AND FULLSCAN
-
- Pattern matching can be very time-consuming because of the num-
- ber of possibilities which must be attempted. In the normal
- "quickscan" mode, SNOBOL4 stops searching for a match when it
-
-
-
- Tutorial - 78 - Advanced Topics
-
-
-
-
-
- thinks further efforts would be futile. The heuristics are com-
- plex, but can be summarized as follows: pattern matching fails
- when there are insufficient subject characters to satisfy the re-
- maining pattern components.
-
- The cursor operator can be used to demonstrate at what point
- SNOBOL4 gives up. For example, in the pattern match
-
- ? 'ABCD' @OUTPUT 'X' LEN(3)
- 0
- Failure
-
- SNOBOL4 does not attempt to match 'X' against 'B' because fewer
- than 3 subject characters remain after it, and LEN(3) could never
- succeed.
-
- A second type of heuristic is the "one character assumption"
- for unevaluated expressions. SNOBOL4 assumes that unevaluated
- expressions will match at least one character. This heuristic
- was originally provided to break recursive loops, but can cause
- programming problems when an unevaluated expression must match
- the null string. Consider a pattern which succeeds if 'B' is at
- least 4 character positions beyond an 'A' in the subject:
-
- ? P = 'A' ARB $ X 'B' *GE(SIZE(X), 4)
- ? 'A12345BC' P
- Success
- ? 'A12345B' P
- Failure
-
- The characters between 'A' and 'B' are matched by ARB, and im-
- mediately assigned to X. The size of X is then compared to 4 by
- the GE function, which succeeds and returns the null string.
- This null string result should not interfere with the pattern
- match, but we find the pattern misbehaves when 'B' is the last
- character of the subject. The unevaluated expression operator
- made SNOBOL4 assume a one character length for the GE function,
- and matching 'B' against the last subject character was never at-
- tempted.
-
- For most pattern matching, heuristics are invisible. However,
- there are circumstances when we would like SNOBOL4 to be exhaus-
- tive in its match attempts. We can disable heuristics and enter
- "fullscan" mode by setting keyword &FULLSCAN nonzero:
-
-
-
-
-
-
-
-
-
-
-
-
-
- Tutorial - 79 - Advanced Topics
-
-
-
-
-
- ? &FULLSCAN = 1
- ? 'A12345B' P
- Success
- ? 'ABCD' @OUTPUT 'X' LEN(3)
- 0
- 1
- 2
- 3
- 4
- Failure
-
- The quickscan mode can be reinstated by setting &FULLSCAN = 0.
-
-
- 9.4 OTHER PRIMITIVE PATTERNS
-
- We can accomplish quite a lot with just the primitive patterns
- ARB and REM. However, there are five additional patterns which
- you should be aware of:
-
- -----------------------------------------------------------------
-
- ABORT End pattern match
-
- The ABORT pattern causes immediate failure of the entire pat-
- tern match, without seeking other alternatives. Usually a match
- succeeds when we find a subject sequence which satisfies the pat-
- tern. The ABORT pattern does the opposite: if we find a certain
- pattern, we will abort the match and fail immediately. For exam-
- ple, suppose we are looking for an 'A' or 'B', but want to fail
- if '1' is encountered first:
-
- ? '--AB-1-' (ANY('AB') | '1' ABORT)
- Success
- ? '--1B-A-' (ANY('AB') | '1' ABORT)
- Failure
-
- The last example may be confusing because the ANY function ap-
- pears as the first alternative, fostering the illusion that it
- will find the 'B' in the subject before the other pattern alter-
- native is tried. However, that is not the order of pattern
- matching; ALL pattern alternatives are tried at cursor position
- zero in the subject. If none succeed, the cursor is advanced by
- one, and all alternatives are tried again. When the cursor is in
- front of subject character '1', ANY still does not match, but the
- second alternative now does. As the '1's match, ABORT is
- reached, causing failure.
-
-
-
-
-
-
-
-
-
-
- Tutorial - 80 - Advanced Topics
-
-
-
-
-
- -----------------------------------------------------------------
-
- BAL Match balanced string
-
- The BAL pattern matches the shortest nonnull string in which
- parentheses are balanced. (A string without parentheses is also
- considered to be balanced.) These strings are balanced:
-
- (X) Y (A!(C:D)) (AB)+(CD) 9395
-
- These are not:
-
- )A+B (A*(B+) (X))
-
- BAL is concerned only with left and right parentheses. The
- matching string does not have to be a well-formed expression in
- the algebraic sense; in fact, it needn't be an algebraic expres-
- sion at all. Like ARB, BAL is most useful when constrained on
- both sides by other pattern components.
-
- -----------------------------------------------------------------
-
- FAIL Seek other alternatives
-
- The FAIL pattern signals failure of this portion of the pattern
- match, causing the pattern matcher to backtrack and seek other
- alternatives. FAIL will also suppress a successful match, which
- can be very useful when the match is being performed for its side
- effects, such as immediate assignment. For example, in unan-
- chored mode, this statement will display the subject characters,
- one per line:
-
- SUBJECT LEN(1) $ OUTPUT FAIL
-
- LEN(1) matches the first subject character, and immediately as-
- signs it to OUTPUT. FAIL tells the pattern matcher to try again,
- and since there are no other alternatives, the entire match is
- retried at the next subject character. Forced failure and re-
- tries continue until the subject is exhausted.
-
- -----------------------------------------------------------------
-
- FENCE Prevent match retries
-
- Pattern FENCE matches the null string and has no effect when
- the pattern matcher is moving left to right in a pattern. How-
- ever, if the pattern matcher is backing up to try other alterna-
- tives, and encounters FENCE, the match fails.
-
- FENCE can be used to "lock in" an earlier success. Suppose we
- want to succeed if the first 'A' or 'B' in the subject is fol-
- lowed by a plus sign. In the following example, the 'A's match,
- we go through the FENCE, and find '+' does not match the next
- subject character, 'B'. SNOBOL4 tries to backtrack, but is
-
-
-
- Tutorial - 81 - Advanced Topics
-
-
-
-
-
- stopped by the FENCE and fails:
-
- ? '1AB+' ANY('AB') FENCE '+'
- Failure
-
- If FENCE were omitted, backtracking would match ANY to 'B', and
- then proceed forward again to match '+'.
-
- If FENCE appears as the first component of a pattern, SNOBOL4
- cannot back up through it to try another subject starting posi-
- tion. This results in an anchored pattern, even if the &ANCHOR
- keyword specifies unanchored mode:
-
- ? 'ABC' FENCE 'B'
- Failure
-
- -----------------------------------------------------------------
-
- SUCCEED Match always
-
- This pattern matches the null string and always succeeds. If
- the scanner is backtracking when it encounters SUCCEED, it re-
- verses and starts forward again. Placing a pattern between
- SUCCEED and FAIL causes the pattern matcher to oscillate.
-
-
- 9.5 OTHER FUNCTIONS
-
- I'd like to briefly point out a few more built-in functions.
- See Chapter 19 for a complete description of their use.
-
- APPLY Allows an indirect call to a function through
- a variable.
-
- CONVERT Provides explicit conversion from one data
- type to another. Chapter 17, "Data Types and
- Conversion," describes the conversions
- possible.
-
- ENDFILE Closes a file and detaches all variables
- associated with it.
-
- ITEM Allows an indirect reference to an array or
- table.
-
- LPAD & RPAD These are padding functions, which will pad a
- string on its left or right side with blanks
- or a given character. Padding is provided to
- a specified width, and is useful when produc-
- ing columnar output.
-
-
-
-
-
-
-
- Tutorial - 82 - Advanced Topics
-
-
-
-
-
- 9.6 OTHER UNARY OPERATORS
-
-
- Operation: Negation
- Symbol: ~ (tilde)
-
- The negation operator, or tilde (~), inverts the success or
- failure result of its operand. If the expression X succeeds,
- then ~X fails. Conversely, if X fails, ~X succeeds and returns
- the null string.
-
-
- Operation: Interrogation
- Symbol ? (question mark)
-
- Unary question mark is called the interrogation operator, al-
- though "value annihilation" might be more descriptive. If X is
- an expression which fails, ?X also fails. However, if X suc-
- ceeds, ?X also succeeds, and returns the null string. In other
- words, any value component of X is replaced by the null string.
-
-
- 9.7 RUN-TIME COMPILATION
-
- The two functions described below are among the most esoteric
- features, not just of SNOBOL4, but of all programming languages
- in existence. While your program is executing, the entire
- SNOBOL4 compiler is just a function call away.
-
- A SNOBOL4 program is nothing more than a string of characters.
- The functions EVAL and CODE let you supply the compiler with
- character strings from within the program itself.
-
-
- 9.7.1 The EVAL Function
-
- This function is used to evaluate an expression. Its argument
- may take a number of forms:
-
- 1. If the argument is an integer, or a number in string form,
- the number is returned as the function result:
-
- ? OUTPUT = EVAL(19)
- 19
-
- 2. If the argument is an unevaluated expression, it is evalu-
- ated using current values for any variables it might con-
- tain. EVAL returns the expression's value as its result:
-
- ? E = *('N SQUARED IS ' N ** 2)
- ? N = 15
- ? OUTPUT = EVAL(E)
- N SQUARED IS 225
-
-
-
-
- Tutorial - 83 - Advanced Topics
-
-
-
-
-
- This is similar to our earlier use of unevaluated expres-
- sions with patterns. In this case, however, the unevaluated
- expression operator (*) must be applied to the entire ex-
- pression to create an object with the EXPRESSION data type.
-
- 3. If the argument is a string (other than a simple number),
- EVAL tries to compile it as a SNOBOL4 expression. Only an
- expression is permitted---not an entire SNOBOL4 statement:
-
- ? OUTPUT = EVAL('3 * N + 2')
- 47
-
- If the string compiles without error, EVAL then evaluates
- the expression and returns the result.
-
- It is this last use of EVAL---to compile a string---which is
- the most interesting. Here is a trivial program which behaves
- like a simple desk calculator.
-
- LOOP OUTPUT = EVAL(INPUT) :S(LOOP)
- END
-
- You can easily try it by placing a semicolon after the GOTO to
- protect it from CODE.SNO's own machinations:
-
- ?LOOP OUTPUT = EVAL(INPUT) :S(LOOP);
- 4 * (5 - 2) / 2
- 6
- N + 1
- 16
- ^Z
-
- The program reads a line of input, compiles and evaluates it,
- and displays the result. Each expression you enter must be well-
- formed according to SNOBOL4's syntax rules. In particular, this
- means there must be blanks around the binary operators.
-
- The BNF program included with Vanilla SNOBOL4 demonstrates that
- EVAL's power is useful even if the input data does not conform to
- SNOBOL4 syntax. It reads a definition of a grammar from a file,
- and converts it to SNOBOL4 patterns.
-
- EVAL fails if evaluation of the argument fails, or if the argu-
- ment contains a syntax error. The SNOBOL4 keyword &ERRTEXT will
- contain a string describing the error.
-
- The expressions used with EVAL may return any SNOBOL4 data
- type, not just numbers. For instance, the expression might con-
- struct a new pattern, and return it as the result:
-
- ITEM = EVAL('SPAN("0123456789") | *LIST')
-
- Note that EVAL can only call the compiler with a string argu-
- ment. If we used a pattern as the argument, we would produce an
-
-
-
- Tutorial - 84 - Advanced Topics
-
-
-
-
-
- execution error:
-
- ITEM = EVAL(SPAN("0123456789") | *LIST) (incorrect)
-
-
- 9.6.2 The CODE Function
-
- CODE accepts a string argument containing one or more state-
- ments to be compiled. Multiple statements are separated by
- semicolons (;). Statements may be labeled, and can include all
- the usual components---subject, pattern, replacement, and GOTO.
- However, comment and continuation statements are not permitted.
-
- The CODE function compiles the statements, and returns a poin-
- ter to the resulting object code. It fails if any statement
- contains an error, and places an error message in &ERRTEXT.
-
- There are two ways to execute the new object code.
-
- 1. Transfer to a label which is defined in the new code:
-
- * Compile a sample piece of code:
- S = 'L OUTPUT = N; N = LT(N,10) N + 1 :S(L)F(DONE)'
- CODE(S)
- * Transfer to a label in it:
- :(L)
- * Come here when the new code transfers back.
- DONE . . .
-
- Notice how we placed a GOTO from the new code back to label
- DONE in the main program. If we had not done this, SNOBOL4
- would terminate when execution "fell out of the bottom" of
- the new code block.
-
- 2. The pointer returned by the CODE function can be used in a
- "direct GOTO" to transfer to the first statement in the code
- block. A direct GOTO is performed by enclosing the pointer
- in angular brackets in the GOTO field:
-
- * Compile a sample piece of code:
- S = 'L OUTPUT = N; N = LT(N,10) N + 1 :S(L)F(DONE)'
- C = CODE(S)
- * Transfer to the first statement in the block:
- :<C>
- DONE . . .
-
- Labels contained in the new program fragment override any
- labels of the same name in your main program. This provides the
- ability to write self-modifying SNOBOL4 programs, and makes the
- division between "code" and "data" far less distinct than in
- other high-level languages.
-
-
-
-
-
-
- Tutorial - 85 - Advanced Topics
-
-
-
-
-
- Chapter 10
-
-
- DEBUGGING AND PROGRAM EFFICIENCY
- -----------------------------------------------------------------
-
-
- 10.1 DEBUGGING AND TRACING
-
- You are probably well aware of the diversity of potential er-
- rors when writing computer programs. They range from simple
- typographical errors made while entering a program, to subtle de-
- sign problems which may only be revealed by unexpected input
- data.
-
- Debugging a SNOBOL4 program is not fundamentally different than
- debugging programs written in other languages. However,
- SNOBOL4's syntactic flexibility and lack of type declarations for
- variables produce some unexpected problems. By way of compensa-
- tion, an unusually powerful trace capability is provided.
-
- Of course, there may come a time when you can't explain your
- program's behavior, and decide "the system" is at fault. No
- guarantee can ever be made that SNOBOL4 is completely free of
- errors. However, its internal algorithms have been in use in
- other SNOBOL4 systems since 1967, and all known errors have been
- removed. Often the problem is a misunderstanding of how a func-
- tion works with exceptional data, and a close reading of the ref-
- erence section clears the problem up. In short, suspect the
- system last.
-
-
- 10.1.1 Compilation Errors
-
- Compilation errors are the simplest to find; SNOBOL4 displays
- the erroneous line on your screen with its statement number, and
- places a marker below the point where the error was encountered.
- The source file name, line number, and column number of the error
- are displayed for use by your text editor. Only the first error
- in a statement is identified, so you should also carefully check
- the remainder of the statement. A typical line looks like this:
-
- 32 ,OUTPUT = CNT+ 1
- ^
- test.sno(57,10) : Compilation Error : Erroneous statement
-
- Here, the comma preceding the word OUTPUT is misplaced. The
- message indicates that ",OUTPUT" is not a valid language element.
-
- Programs containing compilation errors can still be run, at
- least until a statement containing an error is encountered. When
- that happens, SNOBOL4 will produce an execution error message,
- and stop.
-
-
-
-
- Tutorial - 86 - Debugging and Efficiency
-
-
-
-
-
- A complete description of error messages is provided in Chapter
- 20, "System Messages."
-
-
- 10.1.2 Execution Errors
-
- Once a program compiles without error, testing can begin. Two
- kinds of errors are possible: SNOBOL4 detectable errors, such an
- incorrect data type or calling an undefined function, and program
- logic errors that produce incorrect results.
-
- With the first type of error, you'll get a SNOBOL4 error mes-
- sage with statement and line numbers. Inspecting the offending
- line will often reveal typing errors, such as a misspelled func-
- tion name, keyword, or label. If the error is due to incorrect
- data in a variable---such as trying to perform arithmetic on a
- non-numeric string---you'll have to start debugging to discover
- how the incorrect data was created. Placing output statements in
- your program, or using the trace techniques described below, will
- usually find such errors.
-
- Here are some common errors to look for first:
-
- 1. Setting keywords &ANCHOR, &FULLSCAN, and &TRIM improperly.
- We may have written a program with anchored pattern matching
- in mind, but let an unanchored match slip in inadvertently.
- Forgetting to set &TRIM to 1 causes blanks to be appended to
- input lines, and they usually interfere with pattern match-
- ing and conversion of a string to an integer.
-
- 2. Misspelled variable names. Using PUTPUT instead of OUTPUT,
- as in:
-
- PUTPUT = LINE1
-
- creates a new variable and assigns LINE1 to it. Worse still
- is using a misspelled name as a value source, since it will
- return a null string value.
-
- The first type of error is relatively easy to find---produce
- an end-of-run dump by using the SNOBOL4 command line option
- /D. You can study the list of variables for an unexpected
- name. The second type of error is naturally much harder to
- find, because variables with null string values are omitted
- from the end-of-run dump. In this case, you will have to
- study the source program closely for misspellings.
-
- 3. Spurious spaces between a function name and its argument
- list. A line like:
-
- LINE = TRIM (INPUT)
-
- is not a call to the TRIM function. The blank between TRIM
- and the left parenthesis is interpreted as concatenating
-
-
-
- Tutorial - 87 - Debugging and Efficiency
-
-
-
-
-
- variable TRIM with the expression (INPUT). TRIM used as a
- variable is likely to be the null string, so INPUT is
- returned unchanged.
-
- 4. No blank space after a binary operator. SNOBOL4 sees a
- unary operator instead, with completely unexpected results.
- For instance:
-
- X = Y -Z
-
- concatenates Y with the expression -Z.
-
- 5. Confusion occurring when a variable contains a number in
- string form. When used as an argument to most functions,
- conversion from string to number is automatic, and proper
- execution results. However, functions IDENT and DIFFER do
- not convert their arguments, and seemingly equal values are
- thought to be different. For example, if we want to test an
- input line for the number 3, the statements:
-
- N = INPUT
- IDENT(N, 3) :S(OK)
-
- are not correct. N contains a string, which is a different
- data type from the integer 3. This could be corrected by
- using IDENT(+N, 3), or EQ(N, 3). Once again, &TRIM should
- be 1, or the blanks appended to N will prevent its conver-
- sion to an integer.
-
- 6. Omitting the assignment operator when we wish to remove the
- matching substring from a subject, resulting in a program
- which loops forever. For example, our word-counting program
- replaced each word with the null string:
-
- NEXTWRD LINE WRDPAT = :F(READ)
-
- However, by omitting the equal sign we would repeatedly find
- the same first word in LINE:
-
- NEXTWRD LINE WRDPAT :F(READ)
-
- 7. Unexpected statement failure, with no provision for detect-
- ing it in the GOTO field. For example, the CONVERT function
- fails if the table being converted is empty:
-
- RESULT = CONVERT(TALLY, "ARRAY")
-
- RESULT will not be set if CONVERT fails, and a subsequent
- array reference to RESULT would produce an execution error.
-
- 8. Failure can be detected but misinterpreted when there are
- several causes for it in a statement. This statement fails
- when an End-of-File is read, or if the input line does not
- contain any digits:
-
-
-
- Tutorial - 88 - Debugging and Efficiency
-
-
-
-
-
- INPUT SPAN('0123456789') . N :F(EOF)
-
- In the latter case, if we want to generate an error message,
- the statement should be split in two:
-
- N = INPUT :F(EOF)
- N SPAN('0123456789') . N :F(WARN)
-
- 9. Using operators such as alternation (|) and conditional as-
- signment (.) for purposes other than pattern construction.
- Using them in the subject field will produce an 'Illegal
- data type' error message. Using them in the replacement
- field produces a pattern, intended for subsequent use in a
- pattern match statement. For example, this statement sets N
- to a pattern; it does not replace it with the words 'EVEN'
- or 'ODD', as was probably intended:
-
- N = EQ(REMDR(N,2),0) 'EVEN' | 'ODD'
-
- We note in passing that SNOBOL4+, Catspaw's professional
- SNOBOL4 package, provides language extensions that allow
- just that:
-
- N = (EQ(REMDR(N,2),0) 'EVEN', 'ODD')
-
- 10 Forgetting that functions like TAB and BREAK bind subject
- characters. This won't matter for simple pattern matching,
- but for matching with replacement, problems can appear. For
- example, suppose we wanted to replace the 50th character in
- string S with '*'. If we used:
-
- S TAB(49) LEN(1) = '*'
-
- we would find the first 50 characters replaced by a single
- asterisk. Instead, we should say:
-
- S POS(49) LEN(1) = '*'
-
- or, even more efficiently:
-
- S TAB(49) . FRONT LEN(1) = FRONT '*'
-
- 11 Omitting the unevaluated expression operator when defining a
- pattern containing variable arguments. For example, the
- pattern
-
- NTH_CHAR = POS(*N - 1) LEN(1) . CHAR
-
- will copy the Nth subject character to variable CHAR. The
- pattern adjusts automatically if N's value is subsequently
- changed. Omitting the asterisk would capture the value of N
- at the time the pattern is defined (probably the null
- string).
-
-
-
-
- Tutorial - 89 - Debugging and Efficiency
-
-
-
-
-
- 10.1.3 Simple Debugging
-
- These simple methods should find a majority of your bugs:
-
- 1. Set keyword &DUMP nonzero, or use command line option /D to
- get an end-of-run dump. Examine it closely for reasonable
- values and variable names. Dumps can also be produced at
- any time during execution by calling the built-in function
- DUMP.
-
- 2. Use keyword &STLIMIT to end execution after a fixed number
- of statements.
-
- 3. Use the keyboard Control-C key to interrupt a program which
- is looping endlessly, and record the statement number.
-
- 4. Use the GOTO :F(ERROR) to detect unexpected failures and
- data errors. Do not define the label ERROR---SNOBOL4 will
- display the statement number of the error if an attempt is
- made to transfer to label ERROR.
-
- 5. Assign values to OUTPUT to monitor data values. Use immedi-
- ate assignment and cursor assignment (to OUTPUT) to observe
- the operation of a pattern match.
-
- 6. Produce end-of-run statistics with the command line option
- /S. Are the number and kind of operations reasonable?
-
- 7. Use the CODE.SNO program to setup simple test cases. This
- is particularly useful when pattern-matching statements do
- not behave as expected.
-
- More subtle errors can be pinpointed using SNOBOL4's trace fa-
- cility, described below.
-
-
- 10.2 EXECUTION TRACING
-
- Tracing the flow of control and data in a program is usually
- the best way to find difficult problems. SNOBOL4 allows tracing
- of data in variables and some keywords, transfers of control to
- specified labels, and function calls and returns. Two keywords
- control tracing: &FTRACE and &TRACE.
-
-
- 10.2.1 Function Tracing
-
- Keyword &FTRACE is set nonzero to produce a trace message each
- time a program-defined function is called or returns. The trace
- message displays the statement number where the action occurred,
- the name of the function, and the values of its arguments. Func-
- tion returns display the type of return and value, if any. Each
- trace message decrements &FTRACE by one, and tracing ends when
- &FTRACE reaches zero. A typical trace messages looks like this:
-
-
-
- Tutorial - 90 - Debugging and Efficiency
-
-
-
-
-
- STATEMENT 39: LEVEL 0 CALL OF SHIFT('SKYBLUE',3),TIME = 140
- STATEMENT 12: LEVEL 1 RETURN OF SHIFT = 'BLUESKY',TIME = 141
-
- The level number is the overall function call depth. The pro-
- gram execution time in tenths of a second is also provided.
-
-
- 10.2.2 Selective Tracing
-
- Keyword &TRACE will also produce trace messages when it is set
- nonzero. However, the TRACE function must be called to specify
- what is to be traced. Tracing can be selectively ended by using
- the STOPTR function. The TRACE function call takes the form:
-
- TRACE(name, type, string, function)
-
- The name of the item being traced is specified using a string
- or the unary name operator. Besides variables, it is also possi-
- ble to trace a particular element of an array or table:
-
- TRACE('VAR1', ...
- TRACE(.A<2,5>, ...
- TRACE('SHIFT', ...
-
- "Type" is a string describing the kind of trace to be per-
- formed. If omitted, a VALUE trace is assumed:
-
- 'VALUE' Trace whenever name has a value assigned to
- it. Assignment statements, as well as condi-
- tional and immediate assignments within pat-
- tern matching will all produce trace mes-
- sages.
-
- 'CALL' Produce a trace whenever function name is
- called.
-
- 'RETURN' Produce a trace whenever function name
- returns.
-
- 'FUNCTION' Combine the previous two types: trace both
- calls and returns of function name.
-
- 'LABEL' Produce a trace when a GOTO transfer to
- statement name occurs. Flowing sequentially
- into the labeled statement does not produce a
- trace.
-
- 'KEYWORD' Produce a trace when keyword name's value is
- changed by the system. The name is specified
- without an ampersand. Only keywords
- &ERRTYPE, &FNCLEVEL, &STCOUNT, and &STFCOUNT
- may be traced.
-
-
-
-
-
- Tutorial - 91 - Debugging and Efficiency
-
-
-
-
-
- When the first argument is specified with the unary name opera-
- tor, the third argument, string, will be displayed to identify
- the item being traced:
-
- TRACE(.T<"zip">, "VALUE", "Table entry 'zip'")
-
- The last argument, function, is usually omitted. Its use is
- described in the next section.
-
- The form of trace message displayed for each type of trace is
- listed in Chapter 20, "System Messages."
-
- Each time a trace is performed, keyword &TRACE is decreased by
- one. Tracing stops when it reaches zero. Tracing of a particu-
- lar item can also be stopped by function STOPTR:
-
- STOPTR(name, type)
-
-
- 10.2.4 Program Trace Functions
-
- Normally, each trace action displays a descriptive message,
- such as:
-
- STATEMENT 371: SENTENCE = 'Ed ran to town',TIME = 810
-
- Instead, we can instruct SNOBOL4 to call our own program-
- defined function. This allows us to perform whatever trace
- actions we wish. We define the trace function in the normal way,
- using DEFINE, and then specify its name as the fourth argument of
- TRACE. For example, if we want function TRFUN called whenever
- variable COUNT is altered, we would say:
-
- &TRACE = 10000
- TRACE('COUNT', 'VALUE', , 'TRFUN')
- DEFINE('TRFUN(NAME,ID)') :(TRFUN_END)
- . . .
-
- TRFUN will be called with the name of the item being traced,
- 'COUNT', as its first argument. If a third argument was provided
- with TRACE, it too is passed to your trace function, as ID.
- (Here the argument was omitted.) To use trace functions effec-
- tively, we must pause to describe a few more SNOBOL4 keywords:
-
- &LASTNO The statement number of the previous SNOBOL4
- statement executed.
-
- &STCOUNT The total number of statements executed.
- Incremented by one as each statement begins
- execution.
-
- &ERRTYPE Error message number of the last execution
- error.
-
-
-
-
- Tutorial - 92 - Debugging and Efficiency
-
-
-
-
-
- &ERRLIMIT Number of nonfatal execution errors allowed
- before SNOBOL4 will terminate.
-
- The first three keywords are continuously updated by SNOBOL4 as
- a program is executed.
-
- Now, let's consider debugging a program where variable COUNT is
- inexplicably being set to a negative number. Continuing with the
- previous example, the function body would look like this:
-
- &TRACE = 10000
- TRACE('COUNT', 'VALUE', , 'TRFUN')
- DEFINE('TRFUN(NAME,ID)TEMP') :(TRFUN_END)
-
- TRFUN TEMP = &LASTNO
- GE($NAME, 0) :S(RETURN)
- OUTPUT = 'COUNT negative in statement ' TEMP :(END)
- TRFUN_END
-
- The first statement of the function captures the number of the
- last statement executed---the statement that triggered the trace.
- We then check COUNT, and return if it is satisfactory. If it is
- negative, we print an error message and stop the program.
-
- When a trace function is invoked, keywords &TRACE and &FTRACE
- are temporarily set to zero. Their values are restored when the
- trace function returns. There is no limit to the number of func-
- tions or items which may be traced.
-
- Tracing keyword &STCOUNT will call your trace function before
- every program statement is executed.
-
- Program CODE.SNO traces keyword &ERRTYPE to trap nonfatal exe-
- cution errors from your sample statements, and produce an error
- message. Keyword &ERRLIMIT must be set nonzero to prevent
- SNOBOL4 from terminating when an error occurs.
-
-
- 10.3 PROGRAM EFFICIENCY
-
- To a greater extent than other languages, SNOBOL4 programs are
- sensitive to programming methods. Often, there are many differ-
- ent ways to formulate a pattern match, and some will require many
- more match attempts than others.
-
- As you work with SNOBOL4, you will develop an intuitive feel
- for the operation of the pattern matcher, and will write more
- efficient patterns. I can, however, start you off with some gen-
- eral rules:
-
- 1. Try to use anchored, quickscan, and trim modes when possi-
- ble. If operating unanchored, artificially anchor whenever
- possible by using POS(0) or FENCE as the first subpattern.
-
-
-
-
- Tutorial - 93 - Debugging and Efficiency
-
-
-
-
-
- 2. Try to use BREAK and SPAN instead of ARB.
-
- 3. Use ANY instead of an explicit list of one-character strings
- and the alternation operator.
-
- 4. LEN, TAB and RTAB are faster than POS and RPOS. The former
- "step over" subject characters in one operation; the latter
- continually fail until the subject cursor is positioned cor-
- rectly. But be careful of misusing them with replacement
- and replacing more than you expected.
-
- 5. Use conditional assignment instead of immediate assignment
- in pattern matching.
-
- 6. Use IDENT and DIFFER to compare strings for equality,
- instead of pattern matching. Since each unique string is
- stored only once in SNOBOL4, these functions merely compare
- one-word pointers, regardless of string length. By con-
- trast, pattern matching and functions such as LGT must
- perform character by character comparisons.
-
- 7. Avoid ARBNO and recursion if possible.
-
- 8. Pattern construction is time-consuming. Preconstruct pat-
- terns and store them in variables whenever possible.
-
- 9. Keep strings modest in length. Although SNOBOL4 allows
- strings to be thousands of characters long, operating upon
- them is very time-consuming. They use large amounts of
- memory, and force SNOBOL4 to frequently rearrange storage.
-
- 10 Use functions to modularize a program and make it easier to
- understand and maintain.
-
- 11 Avoid algorithms that make a linear search of an array or
- list. The algorithms can usually be rewritten using tables
- and indirect references for associative programming.
-
- Efficiency should not be measured purely in terms of program
- execution time. With the relatively low cost of microcomputers,
- the larger picture of time spent designing, coding, and debugging
- a program also must be considered. A direct approach, empha-
- sizing simplicity, robustness, and ease of understanding usually
- outweighs the advantages of tricky algorithms and shortcut tech-
- niques. (But we admit that tricky pattern matching is fun!)
-
-
-
-
-
-
-
-
-
-
-
-
- Tutorial - 94 - Debugging and Efficiency
-
-
-
-
-
- Chapter 11
-
-
- CONCLUDING REMARKS
- -----------------------------------------------------------------
-
- For much of this tutorial we've been concerned with the de-
- tailed mechanics of pattern matching---the functions, primitive
- patterns, and heuristics of applying a pattern to a character
- string. SNOBOL4 provides so many primitive functions and opera-
- tions that it's easy to get lost in the forest. Let's step back
- and consider SNOBOL4's larger significance.
-
- It would be a mistake to think of SNOBOL4 only as a text pro-
- cessing language. For example, programmers in the artificial
- intelligence field think in terms of lists, and have used the
- LISP language for some time. As Shafto demonstrates, SNOBOL4 can
- be made to emulate LISP, and go well beyond it, using pattern
- matching, backtracking, and associative programming (see file
- SNOBOL4.DOC for information on Shafto's report on AI SNOBOL4
- programming.)
-
- SNOBOL4's pattern matching provides a very powerful and com-
- pletely general recognition system, in which character strings
- happen to be the medium of expression. Other recognition pro-
- blems can be solved by mapping the object to be examined into a
- subject string, and the recognition criteria into SNOBOL4
- patterns.
-
- In the past, use of SNOBOL4 has been hindered by the high cost
- and inconvenience of running it on mainframe computers. Now it's
- on your desk top, with computer time essentially free.
-
- What new insights can SNOBOL4 bring to your problems? Can you
- find other general applications for SNOBOL4's unique abilities?
-
- The future of the language is in your hands.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Tutorial - 95 - Concluding Remarks
-
-
-
-
-
- Chapter 12
-
-
- REFERENCE MANUAL -- INTRODUCTION
- -----------------------------------------------------------------
-
- The reference section describes the SNOBOL4 system. It will
- tell you how to create and run SNOBOL4 programs, and catalogs all
- the standard language features. The tutorial section can be con-
- sulted for illustrative uses of various functions and operators.
-
- SNOBOL4 is a full implementation of the powerful development
- language SNOBOL4 for the IBM PC and the entire 8086/286/386 fam-
- ily of computers. It has all the features of mainframe SNOBOL4,
- plus numerous useful extensions. Compatibility with mainframe
- SNOBOL4 is achieved by basing this product on the Macro Implemen-
- tation used on such mainframes as the IBM 370 and the CDC 7600.
- Thus, it incorporates a thoroughly tested implementation in its
- entirety. All SNOBOL4 string and pattern matching facilities
- available in the mainframe environment are now available to the
- personal computer user.
-
- The SNOBOL4 program contains both a compiler and interpreter.
- They are inseparable, and share many common routines. Your
- source program is compiled into a compact internal notation,
- which is interpreted during execution. More information on the
- internal code may be found in Griswold's "The Macro Implementa-
- tion of SNOBOL4;" see file SNOBOL4.DOC for ordering information.
-
-
- 12.1 LANGUAGE BACKGROUND
-
- In 1962, several researchers at Bell Telephone Laboratories
- (BTL) were applying computers to problems such as factoring mul-
- tivariate polynomials and symbolic integration. Available tools
- were the Symbolic Communication Language (SCL), an internal BTL
- product for processing symbolic expressions, and COMIT, designed
- for natural-language analysis. Both proved inadequate, and
- frustration with them led the researchers to attempt the design
- of a new language.
-
- The original SNOBOL was developed by David J. Farber, Ralph E.
- Griswold, and Ivan P. Polonsky, and was first implemented on an
- IBM 7090 computer in 1963. The name, SNOBOL, came after the im-
- plementation, and ostensibly stands for StriNg Oriented symBOlic
- Language.
-
- It was soon discovered that SNOBOL was applicable to a much
- wider range of problems. In fact, the language proved more in-
- teresting than the problems it was intended to solve. As more
- people used it, new features such as recursive functions were
- added, and its generality grew. By 1964, it had become SNOBOL3,
- and was available on such machines as the IBM 7094, CDC 3600, SDS
- 930, Burroughs 5500, and the RCA 601. Because these implementa-
-
-
-
- Reference - 96 - Introduction
-
-
-
-
-
- tions were all written from scratch, each machine introduced its
- own dialect of the language.
-
- SNOBOL3 had only one data type, the string. The desire for ad-
- ditional data types, more complex pattern matching, and other
- features led to a major redesign of the language in 1966, by
- Ralph Griswold, Jim Poage, and Ivan Polonsky. The new lan-
- guage---SNOBOL4---was also designed to be portable to other
- machines. Most of SNOBOL4 was completed by 1967, although some
- features, such as operator redefinition, did not appear until
- 1969. Portability was achieved by writing the system in a macro
- assembly language for an abstract machine, hence the name "Macro
- Implementation of SNOBOL4." By 1970 it was available on nine
- different types of mainframes. Currently, it is available on
- most large- and medium-scale computers.
-
- The SNOBOL4 language evolved on computers whose primary input/
- output devices were the card reader, card punch, and line
- printer. The current breed of microcomputers are interactive,
- rather than batch-oriented. Thus, SNOBOL4 contains slight alter-
- ations of the language to conform to the personal computer envi-
- ronment. For example, the preassigned output keyword PUNCH has
- been replaced by SCREEN. Experienced SNOBOL4 programmers will
- find little incompatibility with familiar implementations. Most
- existing SNOBOL4 programs should operate correctly using SNOBOL4
- with little or no change.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Reference - 97 - Introduction
-
-
-
-
-
- Chapter 13
-
-
- RUNNING A SNOBOL4 PROGRAM
- -----------------------------------------------------------------
-
-
- 13.1 BASIC COMMAND LINE FORMAT
-
- The format for the command line is:
-
- SNOBOL4 file options ;Comments
-
- Options are specified by a slash (/) or minus sign (-), and one
- or more option letters. When the option requires a file name, an
- equal sign may be used between the option letter and file name
- for readability.
-
- File The source file contains your SNOBOL4 pro-
- gram. If no file is specified, CON: is as-
- sumed, and programs may be entered directly
- from the keyboard. Disk files will have ex-
- tension .SNO supplied if none is specified.
-
- The source and input files may be assigned to any disk file or
- valid input device. The listing, output, and error message files
- may be assigned to any disk file or valid output device. If the
- output disk file does not exist, it will be created.
-
- /I=file The input file is associated with the vari-
- able INPUT when execution begins, as I/O unit
- 5. The default is CON:, your keyboard. Disk
- files will have extension .IN supplied if
- none is specified.
-
- /L=file The listing file receives a listing of your
- program, with assigned statement numbers.
- Default is NUL:, that is, the listing is dis-
- carded. If /L appears without a file name,
- the source program file name will be used,
- with the extension changed to .LST.
-
- /O=file The output file is associated with the vari-
- able OUTPUT when execution begins. This will
- be I/O unit 6. The default is CON:, which is
- usually your computer's display screen. Disk
- files will have extension .OUT supplied if
- none is specified. Execution dumps and trac-
- ings are sent to I/O unit 6.
-
-
-
-
-
-
-
-
- Reference - 98 - Running a SNOBOL4 Program
-
-
-
-
-
- /E=file A list of compilation and runtime error mes-
- sages is written to this file. Default is
- CON:, that is, error messages are displayed
- on the screen. If /E appears without a file
- name, the source program file name will be
- used, with the extension changed to .ERR.
-
- In addition to the /I and /O options, the INPUT and OUTPUT
- variables may also be assigned to files by using the MS-DOS redi-
- rection operators < and > on the command line.
-
- Other I/O files may be specified explicitly within the INPUT
- and OUTPUT functions, or on the command line with a unit number:
-
- /n=file The specified file becomes associated with
- unit number n. N must be in integer between
- 1 and 16. If your program calls the INPUT or
- OUTPUT function without a file name, the file
- specified here will be used. This command
- line option merely makes an association; the
- file is not opened or created until the INPUT
- or OUTPUT function is called.
-
- File names may be a disk file, or any DOS device, such as NUL:,
- CON:, LPT2:, etc.
-
- The remaining option switches alter SNOBOL4's behavior:
-
- /B Termination messages and statistics are nor-
- mally displayed via I/O unit 7 (SCREEN). The
- /B (batch) option instead directs them to I/O
- unit 6 (OUTPUT).
-
- /C SNOBOL4 defaults to case-folding, making
- lower and upper case alphabetics equivalent
- for names and labels. Specifying this option
- inhibits case-folding: upper and lower case
- names are unique and distinct.
-
- /D Sets the &DUMP keyword to 1. This is useful
- when you decide you want an end-of-run vari-
- able dump, and don't want to edit the source
- file.
-
- /H Displays summary of options and Vanilla
- SNOBOL4 license information.
-
- /NX No execution after compilation.
-
- /NP Suppress column position information in error
- messages.
-
- /P Displays additional product information.
-
-
-
-
- Reference - 99 - Running a SNOBOL4 Program
-
-
-
-
-
- /S Provide statistics upon termination.
-
- Vanilla SNOBOL4 works very nicely with text editors that allow
- a program to be compiled from within the editor. If a compila-
- tion or runtime error occurs, you are returned to your editor
- with the cursor positioned on the troublesome statement. To use
- with your editor, you will need to use the command line option
- "/BE-". This writes errors messages to standard output, where
- they can be captured by your text editor.
-
-
- 13.2 PROVIDING YOUR OWN PARAMETERS
-
- The keyword &PARM contains the command line string. It begins
- with the blank following the word SNOBOL4, and contains all char-
- acters up to the terminating carriage return. Since SNOBOL4's
- command processor ignores all characters after a semicolon, com-
- ments placed there can easily communicate additional instructions
- to your program. Break them out with the statement:
-
- &PARM ';' REM . INSTRUCTIONS
-
-
- 13.3 COMMAND LINE EXAMPLES
-
- The command line:
-
- SNOBOL4 PROG
-
- will compile and run a source program from file PROG.SNO, discard
- the listing, and run it with keyboard input and screen output.
- The command line:
-
- SNOBOL4 CONVERT /I=DATA /O=RESULT /2=STYLE.DAT ;DRAFT
-
- will run a program that presumably transforms input file DATA.IN
- to output file RESULT.OUT according to program option 'DRAFT'.
- I/O unit number 2 is associated with the file STYLE.DAT. The
- program can use the variable SCREEN to post error and status mes-
- sages to the user, regardless of the reassignment of the input
- and output files.
-
- SNOBOL4 SOURCE /I=SOURCE.SNO /L=OUTPUT /O=OUTPUT.LST /BCS
-
- sets up a "conventional" batch job, with source program and input
- data on file SOURCE.SNO (following the END statement), listing
- and program output to OUTPUT.LST, no case-folding, and end-of-run
- statistics.
-
-
-
-
-
-
-
-
-
- Reference - 100 - Running a SNOBOL4 Program
-
-
-
-
-
- Chapter 14
-
-
- STATEMENTS
- -----------------------------------------------------------------
-
- Each line of input to SNOBOL4 consists of a sequence of ASCII
- characters, terminated by a carriage return.
-
- Comment and control statements are always one line long. How-
- ever, a program statement may occupy several lines if necessary.
- A continuation mark (plus sign or period) is placed in the first
- column of the additional lines.
-
-
- 14.1 COMMENT STATEMENTS
-
- An asterisk (*) in character position one denotes a comment
- card. All text through the end-of-line is copied to the listing
- file, but is otherwise ignored by SNOBOL4.
-
-
- 14.2 CONTROL STATEMENTS
-
- Control statements provide instructions to the SNOBOL4 com-
- piler. They begin with a minus (-) in character position one.
- Controls may be specified in upper- or lower-case, regardless of
- the current state of case-folding. Unrecognized controls are
- ignored.
-
- -CASE n Fold lower-case names to upper-case if n is
- nonzero. Treat upper- and lower-case names
- as distinct if n is zero or absent.
-
- -EJECT Start a new page on the listing file.
-
- -LIST Equivalent to -LIST LEFT.
-
- -LIST LEFT Turn on list output, produce statement num-
- bers at left end of line.
-
- -LIST RIGHT Turn on list output, produce statement num-
- bers at right end of line.
-
- -UNLIST Turn off list output. Errors are not shown
- on the screen.
-
- SNOBOL4 defaults to -LIST LEFT and -CASE 1.
-
-
- 14.3 PROGRAM STATEMENTS
-
- If a line is not a control or comment statement, it is consid-
- ered SNOBOL4 program text. A SNOBOL4 statement may have up to
-
-
-
- Reference - 101 - Statements
-
-
-
-
-
- five components. The general form of a statement is:
-
- LABEL SUBJECT PATTERN = REPLACEMENT :GOTO
-
- Statement elements are separated by blank or tab.
-
- Ignoring the LABEL and GOTO fields for a moment, the remaining
- elements may appear in various combinations to create different
- types of statements:
-
-
- Evaluate expression: SUBJECT
-
- The expression comprising the subject is evaluated. It may in-
- voke primitive and program-defined functions.
-
-
- Assignment statement: SUBJECT = REPLACEMENT
-
- The value on the right is assigned to the variable on the left.
- If failure occurs when evaluating the subject or replacement com-
- ponents, the assignment does not occur.
-
-
- Pattern match: SUBJECT PATTERN
-
- The subject and pattern expressions are evaluated, and the
- specified pattern is applied to the subject string, producing
- success or failure.
-
-
- Pattern match with replacement: SUBJECT PATTERN = REPLACEMENT
-
- If the pattern match succeeds, the replacement expression is
- evaluated and replaces the portion of the subject matched. Only
- the matched portion is replaced; characters adjacent to the
- matching substring are not disturbed.
-
- If the equal sign (=) is present but the replacement field is
- absent, the null string is assumed as the value of the replace-
- ment field.
-
- The GOTO field provides two-way branching to test the success
- or failure of the preceding statement elements.
-
-
- 14.3.1 Label Field
-
- If a label is present, it must begin with the first character
- of the line. Labels provide a name for the statement, and serve
- as the target for transfer of control from the GOTO field of any
- statement. Labels must begin with a letter or digit, optionally
- followed by an arbitrary string of characters. The label field
- is terminated by the character blank, tab, or semicolon. If the
-
-
-
- Reference - 102 - Statements
-
-
-
-
-
- first character of a line is blank or tab, the label field is
- absent.
-
- If case-folding is in effect, lower-case letters are converted
- to upper-case before defining the label.
-
-
- 14.3.2 Subject Field
-
- The subject field specifies the string which will be the sub-
- ject of pattern matching. It also specifies the left side of a
- simple assignment statement if pattern matching is absent.
-
- In an assignment statement, the subject must be a variable
- name, an unprotected keyword, or a field-reference function from
- a program-defined data type. If a string is produced by evaluat-
- ing an expression, the indirect ($) operator must be used to
- reference the underlying variable.
-
- If the subject appears in pattern matching without replacement,
- the subject must evaluate to a string. The string is scanned
- left to right during the pattern match. If the subject evaluates
- to an integer, it is automatically converted to a string. If re-
- placement is present, the same subject restrictions of assignment
- statements apply. Thus, a literal string is a valid subject only
- if replacement is absent.
-
- If the expression comprising the subject contains the concate-
- nation operator, the subject must be surrounded by parenthesis.
- This allows SNOBOL4 to distinguish concatenation blanks within
- the subject from the blank between subject and pattern.
-
-
- 14.3.3 Pattern Field
-
- The pattern may be a simple string, or a complex expression in-
- volving primitive pattern functions. The pattern specifies one
- or more strings which are systematically searched for in the sub-
- ject. The pattern match succeeds if a match is found, and fails
- otherwise. The &FULLSCAN keyword determines whether the search
- is exhaustive, or if heuristics will be applied to prevent futile
- match attempts.
-
- The pattern may assign various matching components to variables
- with the binary assignment operators dot and dollar sign (., $).
-
-
- 14.3.4 Replacement Field
-
- In an assignment statement, there are very few restrictions on
- the replacement field. If the subject is an unprotected keyword,
- the replacement field must evaluate to an integer value. If the
- subject is a variable, the replacement field is assigned directly
- to it, without type conversion.
-
-
-
- Reference - 103 - Statements
-
-
-
-
-
- If there is pattern matching on the left side of the statement,
- the replacement field must evaluate to a string, so that it may
- be inserted into the matched portion of the subject string.
-
- Replacement occurs only if evaluation of the subject, pattern,
- and replacement succeed. Primitive functions which return suc-
- cess or failure may be used in the replacement field as predicate
- functions. Since they return the null string, they do not alter
- the replacement value. However, their failure can prevent re-
- placement from occurring, and can be tested in the GOTO field.
-
-
- 14.3.5 GOTO Field
-
- Statement execution normally proceeds sequentially from one
- statement to the next. The GOTO field allows this flow to be al-
- tered by directing the SNOBOL4 system to continue execution else-
- where. The GOTO field is set off from the preceding statement
- elements by blank or tab, and colon (:). It may assume three
- forms: unconditional, conditional, and direct.
-
- The "unconditional GOTO" causes control to be transferred to
- the specified labeled statement. The label is enclosed in paren-
- thesis, and may be a name, or the result of evaluating an expres-
- sion and applying the indirect operator ($). Transfer is made to
- the labeled statement regardless of the success or failure out-
- come of the earlier parts of the statement.
-
- The "conditional GOTO" similarly specifies control transfer to
- a labeled statement, but it depends on the success or failure of
- the statement. The letter S precedes the parenthesized label
- where control goes next if the statement succeeds. The letter F
- specifies the branch to be taken if the statement fails. For
- example:
-
- :S(LOOP) Branches to label LOOP if the statement suc-
- ceeds.
-
- :F(ERROR) Branches to label ERROR if the statement
- fails.
-
- :S(OK) F(NOGO) Branches to label OK on success, to NOGO on
- failure.
-
- :(AGAIN) Unconditionally transfers control to label
- AGAIN.
-
- :($('VAR' N)) Branches to the label obtained by concatenat-
- ing the string 'VAR' with the value of vari-
- able N.
-
- The "direct GOTO" is used to branch to a block of code compiled
- with the CODE function. If the code contains labels, a regular
- GOTO could branch to the label and begin execution in the code
-
-
-
- Reference - 104 - Statements
-
-
-
-
-
- block. The direct GOTO will branch to the start of the code
- block, labeled or not. A direct GOTO is specified by placing in
- angle brackets the name of the variable which points to the code
- block.
-
- Direct GOTOs may be made conditional by preceding them with S
- or F. They may also appear with regular GOTOs:
-
- VAR = CODE(string) :S<VAR> F(COMPILE_ERROR)
-
- The lower-case letters "s" and "f" may be used interchangeably
- with "S" and "F", regardless of case-folding.
-
- The GOTO field may appear on a line without any subject, pat-
- tern, and replacement. The absent SNOBOL4 statement is assumed
- to have succeeded.
-
-
- 14.4 CONTINUATION STATEMENTS
-
- A SNOBOL4 statement may be divided across several lines by
- placing a plus (+) or period (.) in character position one of the
- successive lines. There is no limit to the number of continua-
- tion statements allowed. The statement must be divided at a
- point where a blank or tab could appear as an operator or separa-
- tor; it cannot be split in the middle of a name or quoted string.
-
- Very long strings may be entered on multiple lines, using the
- implicit blank between lines as a concatenation operator:
-
- LONG_STRING = "This is an example of a very long "
- + "string that wends its way across multiple continua"
- + "tion statements. There is an implicit blank at the "
- + "beginning of each line that provides the concatenation"
- + " operator between segments."
-
-
- 14.5 MULTIPLE STATEMENTS
-
- The semicolon character may be used to place several statements
- on one line. Each semicolon terminates the current statement and
- behaves like a new "column one" for the statement which follows.
- Only program statements are permitted after the semicolon; con-
- trol and continuation statements are not allowed. Here are some
- examples:
-
- I = 1; J = 2; S PAT = 'HENRI' :S(YES)
- I = 1;OUT OUTPUT = A<I> :F(END); I = I + 1 :(OUT)
-
- Because of its poor readability, placing labels in the middle
- of a statement is strongly discouraged.
-
- As a language extension, Vanilla SNOBOL4 permits a comment
- statement after the semicolon. This provides a simple device for
-
-
-
- Reference - 105 - Statements
-
-
-
-
-
- end-of-line comments:
-
- PARA NEXT = GETNEXT() :F(FRETURN) ;* Return if EOF
- IDENT(NEXT) :S(RETURN) ;* Return on empty line
- PARA = PARA NEXT :(PARA) ;* Splice line
-
-
- 14.6 THE END STATEMENT
-
- The last statement in a program must be an END statement. The
- word END appears in the label field, beginning in column one.
- Normally, it is the only word on the line:
-
- . . .
- OUTPUT = 'All done'
- END
-
- After reading the END statement, compilation ends, and execu-
- tion begins immediately with the very first program statement.
- When the program is done, it should flow into the END statement,
- or use a GOTO to transfer to it.
-
- Occasionally, we would like to begin execution at other than
- the first statement. If we place a statement label in the sub-
- ject field of the END statement, execution will begin there. For
- example, this statement will cause execution to begin at the
- statement labeled START:
-
- END START
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Reference - 106 - Statements
-
-
-
-
-
- Chapter 15
-
-
- OPERATORS
- -----------------------------------------------------------------
-
- Following are lists of all the unary and binary operators in
- SNOBOL4. Unused operators may be attached to program-defined
- functions using the OPSYN function. Unary operators have equal
- precedence among themselves, and higher precedence than binary
- operators. Operators of higher precedence are performed first,
- unless reordered by parentheses. Where several instances of
- operators with the same priority appear, associativity specifies
- which one is performed first.
-
-
- 15.1 UNARY OPERATORS
-
- All unary operators are left-associative: if several appear to-
- gether, they are performed left-to-right.
-
- Graphic Name Definition
- ======= ================= ==============================
- + plus arithmetic positive
- - minus arithmetic negative
- . period name of object (address)
- $ dollar sign indirect reference through object
- * asterisk unevaluated expression
- & ampersand keyword
- ~ tilde negation of success/failure
- ? question mark interrogation
- @ at sign cursor position assignment
- / slash <none>
- ^, ! caret, exclamation <none>
- % percent <none>
- # pound sign <none>
- | vertical bar <none>
-
-
-
- 15.5.1 Indirect Reference and Case-Folding
-
- The indirect reference operator ($) converts a string to a
- variable name. When case-folding is in effect, the string char-
- acters are treated as upper-case letters when producing the name.
- The string itself is not modified. Thus,
-
- $('abc')
-
- references variable ABC when case-folding, and variable abc when
- not.
-
-
-
-
-
-
- Reference - 107 - Operators
-
-
-
-
-
- 15.2 BINARY OPERATORS
-
- Graphic Name Definition Precedence Associates
- ======= ================= =============== ========== ==========
- ~ tilde <none> 12 right
- ? question mark <none> 12 left
- . period conditional assignment 11 left
- $ dollar sign immediate assignment 13 left
- ^, ! caret, exclamation exponentiation 12 right
- ** double asterisk exponentiation 12 right
- % percent <none> 11 left
- * asterisk multiplication 10 left
- / slash division 9 left
- # pound sign <none> 8 left
- + plus addition 7 left
- - minus subtraction 7 left
- @ at sign <none> 6 left
- blank blank concatenation 5 left
- tab tab concatenation 5 left
- | vertical bar alternation 4 left
- & ampersand <none> 3 left
- = equal sign assignment 1 right
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Reference - 108 - Operators
-
-
-
-
-
- Chapter 16
-
-
- KEYWORDS
- -----------------------------------------------------------------
-
- Keywords allow a program to communicate with SNOBOL4. Their
- names are set apart from other variables by the unary operator
- ampersand (&). Protected keywords cannot be changed by a pro-
- gram, while unprotected keywords can.
-
- Several protected keywords can be traced using the TRACE func-
- tion: &ERRTYPE, &FNCLEVEL, &STCOUNT, and &STFCOUNT. Tracing oc-
- curs each time SNOBOL4 alters their value. For example, tracing
- keyword &STCOUNT produces a trace after every SNOBOL4 statement
- is executed.
-
-
- 16.1 PROTECTED KEYWORDS
-
- Among these keywords are several which serve as read-only
- repositories of fundamental system patterns and values, such as
- &ARB. The nonkeyword form (ARB) may be changed by a program, and
- later restored to its original value by assigning it the corre-
- sponding keyword.
-
- &ABORT The primitive pattern ABORT.
-
- &ALPHABET String of 256 ASCII character values in as-
- cending order.
-
- &ARB The primitive pattern ARB.
-
- &BAL The primitive pattern BAL.
-
- &ERRTEXT String containing most recent system gener-
- ated error text.
-
- &ERRTYPE Integer code of the last execution error to
- occur. This keyword may be traced with func-
- tion TRACE().
-
- &FAIL The primitive pattern FAIL.
-
- &FENCE The primitive pattern FENCE.
-
- &FNCLEVEL Integer depth of program-defined function
- calls. It is initially zero, and incremented
- by one for each function call, and decre-
- mented for each function return. This key-
- word may be traced.
-
- &LASTNO Integer statement number of the previous
- statement executed.
-
-
-
- Reference - 109 - Keywords
-
-
-
-
-
- &LCASE The 26 lower-case alphabetic letters.
-
- &PARM The command string used to invoke SNOBOL4.
- Begins with the blank following the word
- SNOBOL4.
-
- &REM The primitive pattern REM.
-
- &RTNTYPE Contains a string describing the type of re-
- turn most recently made by a program-defined
- function, either 'RETURN', 'FRETURN', or
- 'NRETURN'.
-
- &STCOUNT Integer count of the number of statements
- executed. This keyword may be traced. Since
- integers are 16-bit quantities, executing
- more than 32,767 statements will cause this
- keyword to overflow. No harm results, and
- the keyword may still be traced, but its
- value will be a large negative number.
-
- &STFCOUNT Integer count of the number of statements
- which failed. This keyword may be traced.
- The same overflow problem discussed for
- &STCOUNT occurs with this keyword.
-
- &STNO Integer statement number of the current
- statement being executed.
-
- &SUCCEED The primitive pattern SUCCEED.
-
- &UCASE The 26 upper-case alphabetic letters.
-
-
- 16.2 UNPROTECTED KEYWORDS
-
- These keywords may be set to integer values to modify SNOBOL4's
- behavior.
-
- &ANCHOR Nonzero for anchored pattern match. Ini-
- tially 0, unanchored.
-
- &CASE Zero to prevent case-folding during compila-
- tion with the functions CODE and EVAL. Ini-
- tially 1, causing case-folding to occur.
-
- &CODE The end-of-job code is an integer value in
- the range 0 to 255 returned to the operating
- system. It can be tested with the DOS Batch
- condition ERRORLEVEL. Initially 0.
-
-
-
-
-
-
-
- Reference - 110 - Keywords
-
-
-
-
-
- &DUMP Nonzero to list unprotected keywords and
- variables with nonnull values at program ter-
- mination. A positive value causes the list
- to be sorted; negative values leave them un-
- sorted. Initially 0. The dump is produced
- to I/O unit 6 (OUTPUT).
-
- &ERRLIMIT Determines the number of conditionally fatal
- execution errors permitted before terminating
- a program. The Execution Error Messages sec-
- tion of Chapter 20, "System Messages," de-
- scribes the errors which are conditionally
- fatal. Initially 0, causing SNOBOL4 to stop
- if any error occurs.
-
- &FTRACE Nonzero value causes each call and return of
- a program-defined function to be listed.
- Decremented for each trace. Initially 0.
-
- &FULLSCAN Nonzero to disable pattern matching heuris-
- tics. Initially 0, the quickscan mode of
- pattern matching.
-
- &INPUT Zero to disable all input. When disabled,
- using variable INPUT (or other input-associ-
- ated variables) does not read data from the
- file. Initially 1, input is enabled.
-
- &MAXLNGTH Maximum string length. Initially 5000, maxi-
- mum value is 32767. Memory limitations in
- Vanilla SNOBOL4 will limit actual strings to
- a smaller size.
-
- &OUTPUT Zero to disable all output. When disabled,
- assigning data to OUTPUT or SCREEN (or other
- output-associated variables) does not write
- data to the file. Initially 1, output is en-
- abled.
-
- &STLIMIT The number of statements allowed to execute.
- If positive, it is decremented for each
- statement executed; execution terminates when
- it reaches 0. If negative, there is no
- limit, and it is not decremented. Initially
- -1.
-
- &TRACE Nonzero to permit tracing with the TRACE
- function. Initially 0, it is decremented for
- each trace performed.
-
- &TRIM Nonzero to strip trailing blanks from lines
- read from ASCII files. This is faster than
- using the TRIM function. It does not strip
- trailing tab characters. Initially 0: blanks
-
-
-
- Reference - 111 - Keywords
-
-
-
-
-
- are not removed and short records are blank
- padded to the file's standard record length.
-
-
- 16.3 SPECIAL NAMES
-
- The following names have special meaning to SNOBOL4. If case-
- folding is in effect, they may appear with any combination of
- upper- and lower-case letters.
-
- END This is a special label which denotes the
- last statement of the user's program. An op-
- tional label may follow the word END (in the
- subject field) to denote where program execu-
- tion is to begin. A program should terminate
- execution by transferring to label END.
-
- FRETURN Transfer to this label to return from a
- program-defined function with a failure indi-
- cation.
-
- INPUT Variable associated with input from unit
- number 5.
-
- NRETURN Transfer to this label to return successfully
- from a program-defined function by name,
- rather than by value. The function name
- should be assigned a name result (usually
- with the period (.) unary operator). This
- permits a function call to be the object of
- an assignment operation.
-
- OUTPUT Variable associated with output to unit
- number 6.
-
- RETURN Transfer to this label to return from a
- program-defined function with a success indi-
- cation. A value may be returned as the func-
- tion's result; simply assign it to a variable
- with the same name as the function before
- transferring to RETURN.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Reference - 112 - Keywords
-
-
-
-
-
- Chapter 17
-
-
- DATA TYPES AND CONVERSION
- -----------------------------------------------------------------
-
- Most other programming languages require the user to explicitly
- declare the type of data to be stored in a variable. In SNOBOL4,
- any variable may contain any data type. Furthermore, the vari-
- able's type may be freely altered during program execution.
- SNOBOL4 remembers what kind of data is in each variable.
-
-
- 17.1 DATA TYPE NAMES
-
- The formal name of a data type is specified by an upper-case
- string (or lower-case if case-folding is in effect), such as
- 'INTEGER', or 'ARRAY'. It is used with the CONVERT function to
- specify the data type conversion desired. The formal name is
- also the string returned when the DATATYPE() function is used to
- determine an object's type.
-
- -----------------------------------------------------------------
-
- ARRAY N-dimensional array
-
- The primitive function ARRAY() creates an array storage area, and
- returns a pointer with this data type. If this pointer is stored
- in a variable, the variable is said to be of type ARRAY, and may
- then be subscripted to access the elements of the array.
-
- -----------------------------------------------------------------
-
- CODE Compiled SNOBOL4 code
-
- The primitive function CODE() compiles a string containing
- SNOBOL4 statements, and returns a pointer to the resulting object
- code block. If this pointer is stored in a variable, the vari-
- able is said to be of type CODE. The variable may then be used
- with a direct GOTO by enclosing it in angle brackets.
-
- -----------------------------------------------------------------
-
- EXPRESSION Unevaluated expression
-
- When the unevaluated expression operator (*) is applied to an ex-
- pression, the result has the data type EXPRESSION. Such expres-
- sions are not evaluated when they are defined, only when they are
- referenced.
-
- E = *(LEN(K) POS(M))
-
- defines E as an unevaluated expression. When this statement is
- executed, the code to concatenate two function calls is compiled,
-
-
-
- Reference - 113 - Data Types and Conversion
-
-
-
-
-
- but not executed. It is only when E is referenced in a subse-
- quent pattern match or appears as the argument of the EVAL
- function that the code is executed to produce a pattern.
-
- The unevaluated expression operator must be at the outermost
- level to create an object of type EXPRESSION. If buried with the
- expression, the execution results may appear to be similar, but
- the object's data type is different. That is, the two statements
-
- P = *LEN(N)
- P = LEN(*N)
-
- produce identical results when P is used in a pattern match (if
- LEN is not redefined). However, the first statement produces P
- as type EXPRESSION, while the second produces P as type PATTERN.
- Expressions may also be produced explicitly with the CONVERT()
- function (see below).
-
- -----------------------------------------------------------------
-
- EXTERNAL Created by external function
-
- External assembly language functions may create new data types
- whose structure is known only to them. This feature is only
- available in SNOBOL4+, Catspaw's enhanced implementation of the
- SNOBOL4 language.
-
- -----------------------------------------------------------------
-
- INTEGER Integer number
-
- A decimal number in the range -32767 to +32767. No fractional
- part may appear. One computer word (16 bits) is used to contain
- an integer value.
-
- -----------------------------------------------------------------
-
- NAME Name of a variable
-
- When the unary name operator (.) is applied to a variable, two
- results are possible. If the variable's name is a simple string
- (a "natural variable"), such as ABC, the variable's name is
- returned as type STRING. For example, .ABC has the value 'ABC'.
- However, if the variable is a created variable, such as a table
- or array element, the NAME data type results. In either case,
- the result of the name operator can be thought of as the
- "address" or "storage location" of the variable. When the indi-
- rect reference operator ($) is applied to such a result, the
- original, underlying object is obtained. That is, $(.A) is the
- same as using the variable A.
-
- For natural variables, SNOBOL4 has the surprising property that
- the string 'XYZ' is the address (or name) of variable XYZ, so
- $'XYZ' is equivalent to XYZ.
-
-
-
- Reference - 114 - Data Types and Conversion
-
-
-
-
-
- -----------------------------------------------------------------
-
- PATTERN Pattern match structure
-
- A pattern is created by an expression containing any of the fol-
- lowing: other patterns, primitive patterns, pattern functions,
- the alternation operator (|), the conditional or immediate as-
- signment operator (. or $), or the cursor position operator (@).
- A simple string is not a pattern data type, even though it may
- appear in the pattern portion of a statement. The following are
- examples of the pattern data type:
-
- POS(0) "A" LEN(1)
- "COLUMN A" | "COLUMN B"
- "ZIP" . X
- "MATCH" @Y
-
- -----------------------------------------------------------------
-
- Program-defined data type Created by DATA() function
-
- Up to 899 new data types may be created with the primitive func-
- tion DATA. The name specified in the prototype string becomes a
- new data type in SNOBOL4. Any object created with the data
- type's creation function is given this name as its data type.
-
- DATA('COMPLEX(REAL, IMAG)') ;* Define new type COMPLEX
- NUM = COMPLEX(2, -4) ;* Create a COMPLEX object
- OUTPUT = DATATYPE(NUM) ;* Print string 'COMPLEX'
-
- -----------------------------------------------------------------
-
- REAL Real number
-
- A floating-point decimal number in the range 2.3E-308 to
- 1.7E+308. Reals are only available in SNOBOL4+, Catspaw's
- enhanced implementation of the SNOBOL4 language.
-
- -----------------------------------------------------------------
-
- STRING Character string
-
- A sequence of characters. Each character occupies one memory
- byte, and may contain any of the 256 possible bit combinations.
- A string of length zero is called the null string. Maximum
- length of a string is determined by the keyword &MAXLNGTH
- (default 5000). Memory restrictions in Vanilla SNOBOL4 will
- limit the longest string possible to less than the 32767
- characters allowed in SNOBOL4+, Catspaw's enhanced SNOBOL4
- implementation.
-
-
-
-
-
-
-
- Reference - 115 - Data Types and Conversion
-
-
-
-
-
- -----------------------------------------------------------------
-
- TABLE Associatively referenced table
-
- The primitive function TABLE() creates a table storage area, and
- returns a pointer with this data type. If this pointer is stored
- in a variable, the variable is said to be of type TABLE. The
- variable may then be subscripted to access the elements of the
- table. A table may be thought of as a one dimensional array in
- which the array subscripts may be any SNOBOL4 data type. Arrays
- require integer subscripts, but table subscripts such as
- T<"TALLY"> or T<13.52> are acceptable.
-
-
- 17.2 DATA TYPE CONVERSION
-
- Data may be implicitly or explicitly converted from one type to
- another.
-
-
- 17.2.1 Implicit Conversion
-
- Implicit conversion occurs automatically when SNOBOL4 requires
- a certain data type, and your program provides it in another
- form. Conversion to the correct data type will be attempted, and
- an error message given if conversion is not possible.
-
-
- 17.2.2 Explicit Conversion
-
- A program may use the CONVERT() function to explicitly convert
- an object to another data type. The first argument is the object
- to be converted; the second is a string containing the formal
- name of the desired data type. The formal name must be in upper-
- case (lower-case allowed if case-folding). If conversion is
- possible, the function succeeds and returns the converted object.
- If not, the function fails. The call looks like this:
-
- NEWTYPE = CONVERT(OBJECT, "DESIRED TYPE")
-
-
- 17.2.3 Permissible Conversions
-
- -----------------------------------------------------------------
-
- ARRAY to STRING
-
- The formal name "ARRAY" is produced. The defining array dimen-
- sion string is appended if less than 20 characters:
-
- A = ARRAY('1:50,6')
- OUTPUT = A
-
- produces the string "ARRAY('1:50,6')".
-
-
-
- Reference - 116 - Data Types and Conversion
-
-
-
-
-
- -----------------------------------------------------------------
-
- CODE to STRING
-
- The formal name "CODE" is produced:
-
- C = CODE(' PIT2 = .OPPIT4 :(RETURN)')
- OUTPUT = C
-
- displays the string "CODE".
-
- -----------------------------------------------------------------
-
- EXPRESSION to PATTERN
-
- This occurs implicitly within a pattern match, or by using the
- EVAL function. The deferred expression is evaluated, using
- current values for any variables which appear. Example:
-
- LASTN = *(RTAB(N) REM . LCHARS)
- . . .
- N = 4
- SUBJECT LASTN :F(TOO_SHORT)
-
- -----------------------------------------------------------------
-
- EXPRESSION to STRING
-
- The formal name "EXPRESSION" is produced. For example,
-
- LASTN = *(RTAB(N) REM . LCHARS)
- OUTPUT = LASTN
-
- produces the string "EXPRESSION".
-
- -----------------------------------------------------------------
-
- INTEGER to PATTERN
-
- This only occurs implicitly within a pattern match. The integer
- is converted to a string, and the string converted to a pattern.
- Example:
-
- SUBJECT 19 = ''
-
-
-
-
-
-
-
-
-
-
-
-
-
- Reference - 117 - Data Types and Conversion
-
-
-
-
-
- -----------------------------------------------------------------
-
- INTEGER to STRING
-
- Leading zeros are suppressed, and a minus sign appears if the
- integer was negative. Integer zero is converted to the string
- "0". For example,
-
- A = -23; B = 0; C = 92
- OUTPUT = A B C
-
- produces the string "-23092".
-
- -----------------------------------------------------------------
-
- NAME to STRING
-
- The formal name "NAME" is produced:
-
- N = .A[2]
- OUTPUT = N
-
- displays the string "NAME".
-
- -----------------------------------------------------------------
-
- PATTERN to STRING
-
- The formal name "PATTERN" is produced. For example,
-
- WPAT = BREAK(LETTERS) SPAN(LETTERS) . WORD
- OUTPUT = WPAT
-
- produces the string "PATTERN".
-
- -----------------------------------------------------------------
-
- DEFINED DATA TYPE to STRING
-
- The formal name from the defining DATA function call is returned.
-
- DATA('COMPLEX(REAL,IMAG)')
- R1 = COMPLEX(2, 3)
- OUTPUT = R1
-
- produces the string "COMPLEX".
-
- -----------------------------------------------------------------
-
- STRING to INTEGER
-
- The string must not have any leading or trailing blanks. A lead-
- ing plus or minus sign is allowed, but must be followed by at
- least one digit. Leading zeros are allowed, and the resulting
-
-
-
- Reference - 118 - Data Types and Conversion
-
-
-
-
-
- value must be in the legal range for integer values. A null
- string is converted to integer zero.
-
- RESULT = ("-14" + "") / "2"
-
- stores integer -7 in RESULT.
-
- -----------------------------------------------------------------
-
- STRING to PATTERN
-
- This only occurs implicitly within a pattern match. The pattern
- created will match the specified substring:
-
- SUBJECT "HOPE"
-
- -----------------------------------------------------------------
-
- TABLE to ARRAY
-
- This only occurs when using the CONVERT function. The table is
- converted to a two dimensional array. Example:
-
- T = TABLE(100)
- . . .
- A = CONVERT(T, "ARRAY") :F(EMPTY)
-
- The table is converted to a rectangular array. Null table
- entries are omitted, and there must be at least one nonnull entry
- or the function fails. An N by 2 array is created, where N is
- the number of nonnull table values. The first array column con-
- tains the table subscripts, the second column contains the entry
- values.
-
- -----------------------------------------------------------------
-
- TABLE to STRING
-
- The formal name "TABLE" is returned with the present size of the
- table and its expansion increment. For example,
-
- T = TABLE(10,10)
- . . .
- ; Insert 45 nonnull elements into T
- . . .
- OUTPUT = T
-
- produces the string "TABLE(50,10)" (because table segments in
- this case are allocated in multiples of 10).
-
- The following matrix indicates conversions with CONVERT():
-
-
-
-
-
-
- Reference - 119 - Data Types and Conversion
-
-
-
-
-
- | Result Type E
- | X
- | P
- | I P R D
- | S N A E E
- | T T T A T S F
- | R E T N R A C S I
- | I G E A R B O I N
- Argument | N E R M A L D O E
- Type | G R N E Y E E N D
- -----------+-----------------------------------
- STRING | * I P C E
- INTEGER | S * P
- PATTERN | F *
- NAME | F *
- ARRAY | A * 1
- TABLE | T 2 *
- CODE | F *
- EXPRESSION | F P *
- DEFINED | F *
-
- * The argument object is returned unchanged.
-
- A The formal data type name "ARRAY" is returned with the
- defining prototype string if it is less than 20 characters.
-
- C CONVERT(string,"CODE) behaves exactly like CODE(string).
-
- E Produces an unevaluated expression, that may be subsequently
- used in a pattern, or evaluated with the EVAL() function.
-
- F The formal data type name is returned.
-
- I Numeric conversion is conditioned on magnitude and syntax
- restrictions. No leading or trailing blanks are permitted.
-
- P Occurs implicitly within a pattern match.
-
- S A number may always be converted to its string form.
-
- T The string "TABLE" is returned with the present size of the
- table and its expansion increment: "TABLE(50,10)".
-
- 1 The array must be rectangular, with a second dimension of 2
- (N rows by 2 columns). A table with N entries is created.
- The table subscripts are taken from the first column of the
- array; the table values are copied from the second column.
-
- 2 The table is converted to a rectangular array. Null table
- entries are omitted, and there must be at least one nonnull
- entry or the function fails. An N by 2 array is created,
- where N is the number of nonnull table values. The first
- array column contains the table subscripts, the second col-
- umn contains the entry values.
-
-
-
- Reference - 120 - Data Types and Conversion
-
-
-
-
-
- Chapter 18
-
-
- PATTERNS AND PATTERN FUNCTIONS
- -----------------------------------------------------------------
-
- The SNOBOL4 pattern matcher is called the "scanner." The
- "cursor" is the scanner's pointer into the subject string; it
- points between subject characters (no relation to your CRT cur-
- sor). It is initially zero when positioned to the left of the
- subject, and is incremented as the scanner moves to the right in
- the subject.
-
-
- 18.1 PRIMITIVE PATTERNS
-
- These variables initially contain the primitive patterns of the
- same name. They may be set to other values by a program, and re-
- stored to their original value from the corresponding protected
- keywords.
-
- ABORT Causes immediate failure of the entire pat-
- tern match, without seeking alternatives.
-
- ARB Matches zero or more characters of the sub-
- ject string. It matches the shortest possi-
- ble substring.
-
- BAL Matches any nonnull string which is balanced
- with respect to parentheses. A string with-
- out parentheses is considered balanced. BAL
- matches the shortest string possible.
-
- FAIL Causes failure of this portion of the pattern
- match, causing the scanner to backtrack and
- try alternatives.
-
- FENCE Matches the null string and succeeds when the
- scanner is moving left to right in a pattern,
- but fails if the scanner has to back up
- through it, seeking alternatives.
-
- REM Matches zero or more characters from the cur-
- rent cursor position to the end of the sub-
- ject string.
-
- SUCCEED Matches the null string and always succeeds.
-
- Altering these primitive patterns can produce very confusing
- programs, unless the new value encompasses the old, like this:
-
- ARB = &ARB . OUTPUT
-
-
-
-
-
- Reference - 121 - Patterns
-
-
-
-
-
- 18.2 PRIMITIVE PATTERN FUNCTIONS
-
- These functions produce a pattern based on the argument sup-
- plied. The argument data type is shown below---other data types
- or expressions will be converted to the required type if
- possible.
-
- Pattern functions may be combined with other primitive pat-
- terns, functions, and strings using the alternation and concate-
- nation operators to produce larger patterns.
-
- -----------------------------------------------------------------
-
- ANY(string) Match one character from set
-
- Matches exactly one character from the set of characters speci-
- fied by the argument string.
-
- -----------------------------------------------------------------
-
- ARBNO(pattern) Match repeated pattern
-
- Matches zero or more consecutive occurrences of the string
- matched by the argument pattern. ARBNO matches the shortest
- string possible--initially the null string--and only tries to
- match pattern if other pattern components in the statement re-
- quire it.
-
- -----------------------------------------------------------------
-
- BREAK(string) Match characters not in set
-
- Matches zero or more characters provided they are not in the set
- of characters in the argument string. That is, it matches up to,
- but not including, a character from the argument string.
-
- -----------------------------------------------------------------
-
- LEN(integer) Match fixed length string
-
- Matches a string of the specified length. There are no restric-
- tions on the subject string characters. An argument of zero will
- match the null string.
-
- -----------------------------------------------------------------
-
- NOTANY(string) Match one character not in set
-
- Matches exactly one character provided it is not in the set of
- characters specified by the argument string.
-
-
-
-
-
-
-
- Reference - 122 - Patterns
-
-
-
-
-
- -----------------------------------------------------------------
-
- POS(integer) Verify scanner position
-
- Succeeds if the scanner's current cursor position in the subject
- string is equal to the specified integer value. This function
- merely verifies scanner position---it does not consume or match
- any subject characters. POS(0) as the first component of a pat-
- tern produces an anchored pattern match.
-
- -----------------------------------------------------------------
-
- RPOS(integer) Verify scanner position from end
-
- Succeeds if the scanner's current cursor position in the subject
- string is the specified number of characters from the end of the
- string. Like POS(), it verifies scanner position but does not
- consume any characters. RPOS(0) as the last component of a pat-
- tern forces the pattern to match to the end of the subject
- string.
-
- -----------------------------------------------------------------
-
- RTAB(integer) Match through position counting
- from end
-
- Matches all characters from the current cursor position up to the
- specified cursor position, counting from the end of the subject
- string. RTAB(N) matches characters up to, but not including, the
- final N characters of the subject.
-
- -----------------------------------------------------------------
-
- SPAN(string) Match characters in set
-
- Matches one or more characters from the set of characters speci-
- fied by the argument string. SPAN will not match the null
- string; at least one character from the argument string must be
- found in the subject.
-
- -----------------------------------------------------------------
-
- TAB(integer) Match through fixed position
-
- Matches all characters from the current cursor position up to the
- specified cursor position. TAB(N) matches characters up to, and
- including, the initial N characters of the subject. TAB will
- match the null string if the target position and current cursor
- position are the same. The function fails if the current scanner
- position is to the right of the target position.
-
-
-
-
-
-
-
- Reference - 123 - Patterns
-
-
-
-
-
- Chapter 19
-
-
- BUILT-IN FUNCTIONS
- -----------------------------------------------------------------
-
- In this chapter, the following items are used to indicate the
- required argument type. Other types may be used, and will be
- automatically converted to the required type, if possible. Inte-
- ger suffixes will be used to distinguish multiple arguments of
- the same type.
-
- arg A generic argument of any SNOBOL4 data type.
-
- array An array.
-
- i An integer number.
-
- name The name of a variable, function or label,
- such as .VAR or 'VAR'. When case-folding,
- 'VAR' and 'var' are equivalent as names.
-
- s Any SNOBOL4 string.
-
- table A table.
-
- unit I/O unit; an integer between 1 and 16.
-
- If an argument is omitted in a function call, SNOBOL4 supplies
- the null string instead.
-
- -----------------------------------------------------------------
-
- APPLY(name, arg1, arg2,...,argn) Indirect call to a function
-
- Call function name with the specified arguments. Since name may
- be a variable containing a function name, it allows an indirect
- call to a function, similar to the :($VAR) construct in the GOTO
- field.
-
- -----------------------------------------------------------------
-
- ARG(name, i) Get dummy argument name from
- function definition
-
- Returns a string which is the Ith argument from the formal defi-
- nition of program-defined function name. ARG fails if i is
- greater than the number of arguments in name's definition. ARG
- is useful when one function is used to trace another. The trace
- function can access the actual argument used with the function
- being traced with an indirect reference: $ARG(name, i).
-
-
-
-
-
-
- Reference - 124 - Built-In Functions
-
-
-
-
-
- -----------------------------------------------------------------
-
- ARRAY(s, arg) Create an array
-
- S is a prototype which specifies the dimensions of the array cre-
- ated, and the optional arg is the value used to initialize all
- array elements. The form of the prototype string is:
-
- "L1:H1,L2:H2,...,Ln:Hn"
-
- where L and H are integers giving the lower and upper bounds of
- each dimension. Blanks are not permitted. If the lower bound
- and colon are omitted from any dimension, '1:' is assumed. ARRAY
- returns a pointer to the new array, which should be assigned to a
- variable. The variable can then be subscripted to access the
- array elements.
-
- A common error when defining a multidimensional array is to use
- integers instead of a string for the prototype:
-
- ARRAY(3,4) instead of ARRAY("3,4")
-
- The first example defines a 3-element, one-dimensional array,
- with elements initialized to integer 4. The second defines a
- rectangular array, 3 rows by 4 columns.
-
- -----------------------------------------------------------------
-
- CHAR(i) Convert integer to ASCII
- character
-
- Converts an integer ASCII code to a one-character string. The
- argument must be in the range 0 to 255, otherwise the function
- fails.
-
- -----------------------------------------------------------------
-
- CLEAR() Clear all variables
-
- The null string is assigned to all variables in the system
- (including primitive patterns, such as ARB. These patterns and
- names may be restored from the protected keywords with the same
- names (e.g., ARB = &ARB).
-
- CLEAR does not modify variables which are currently saved on
- the function call stack.
-
- -----------------------------------------------------------------
-
- CODE(s) Compile a string
-
- Returns a pointer to the object code compiled from the SNOBOL4
- statements in string s. This pointer can be assigned to a vari-
- able, and the code executed with the direct GOTO :<variable>.
-
-
-
- Reference - 125 - Built-In Functions
-
-
-
-
-
- CODE fails if it finds a syntax error, and places an error mes-
- sage string in keyword &ERRTEXT. Individual statements in s are
- separated by a semicolon (;). The first character following a
- semicolon must be a blank, tab, the start of a label, or a com-
- ment. Control and continuation statements are not allowed in s.
- Statements may be any length; the 120 character limit when com-
- piling from a file does not apply. Case-folding of names is con-
- trolled by keyword &CASE.
-
- -----------------------------------------------------------------
-
- COLLECT(i) Regenerate storage
-
- This function calls SNOBOL4's garbage collection routine, which
- reclaims all unused storage. It returns an integer result that
- is the number of free descriptors remaining in the work space (a
- descriptor contains 5 bytes of storage). If there are less than
- i free descriptors after regeneration, the function fails.
- SNOBOL4 automatically calls COLLECT whenever memory becomes full.
-
- -----------------------------------------------------------------
-
- CONVERT(arg, s) Convert to specified data type
-
- The argument is converted to the specified data type and returned
- as the value of the function. If conversion is not possible, the
- function fails. S is a data type name string, such as 'STRING',
- 'TABLE', etc. Data type names may be lower case if case-folding
- is active. Chapter 17, "Data Types and Conversion," lists allow-
- able conversions.
-
- -----------------------------------------------------------------
-
- COPY(arg) Make copy of argument
-
- Returns a distinct copy of arg. The argument may be an array,
- code block, pattern, or program-defined data type. If A is an
- array, the statement
-
- B = COPY(A)
-
- creates a new array B, whose initial contents are the same as
- array A. Their elements are independent; altering element A<I>
- does not affect element B<I>. In contrast, the assignment B = A
- makes A and B alternate names for the same array.
-
- -----------------------------------------------------------------
-
- DATA(s) Create new data type
-
- Defines a new data type according to the prototype in string s.
- The prototype assumes a form similar to a function call, with the
- data type taking the place of the function name, and the field
- names replacing the arguments. The form of the prototype string
-
-
-
- Reference - 126 - Built-In Functions
-
-
-
-
-
- is
-
- "NEWTYPE(FIELD1,FIELD2,...,FIELDn)"
-
- The DATA function implicitly defines a new function and n new
- field variables:
-
- NEWTYPE(ARG1,ARG2,...,ARGn) Object creation function.
-
- FIELD1(x) Reference to field variable 1.
-
- . . .
-
- FIELDn(x) Reference to field variable n.
-
- where x is an object created with the NEWTYPE function.
-
- The fields may be of any data type, including pointers to other
- program-defined data items.
-
- -----------------------------------------------------------------
-
- DATATYPE(arg) Get data type of argument
-
- Returns a string specifying the data type of the argument. Some
- typical arguments and their data types are:
-
- 12 INTEGER
-
- 'ABCD' STRING
-
- POS(2) 'C' LEN(3) PATTERN
-
- .Q<3> NAME
-
- *PAT EXPRESSION
-
- If the argument is a program-defined data type, the name from
- the creating DATA() function is returned.
-
- -----------------------------------------------------------------
-
- DATE() Get current date and time
-
- Returns a 20-character string of the form:
-
- 'MM-DD-YY HH:MM:SS.CC'
-
- representing month, day, year, hour, minute, second, and cen-
- tisecond respectively. The centisecond field can only be approx-
- imated, since many personal computer clocks are only updated
- every 55 milliseconds.
-
- -----------------------------------------------------------------
-
-
-
- Reference - 127 - Built-In Functions
-
-
-
-
-
- DEFINE(s, name) Create program-defined function
-
- This function creates a new, program-defined function. S is a
- prototype string specifying the function's name, arguments, and
- local variables, if any. Name is optional, and specifies a label
- as the first statement of the function body. If absent, a label
- with the same name as the function is the assumed entry point.
- The form of the prototype string is
-
- "FNAME(ARG1,ARG2,...,ARGn)LOCAL1,LOCAL2,...,LOCALn"
-
- where FNAME is the name of the function, and ARGi are names of
- formal arguments to the function. Blanks are not permitted in
- the prototype. The values of variables specified in the list of
- locals are saved prior to function entry, and restored upon func-
- tion return.
-
- Functions may return a value or variable name by assigning the
- result to a variable with the same name as the function. Func-
- tions return by transferring to one of the reserved labels
- RETURN, NRETURN, or FRETURN to return by value, by name, or to
- fail respectively.
-
- -----------------------------------------------------------------
-
- DETACH(name) Remove I/O association
-
- Removes any input or output unit associated with the variable
- name. The underlying file is not affected in any way. Remember
- that name is the address of the variable (e.g. .X or 'X'), not
- the variable itself.
-
- -----------------------------------------------------------------
-
- DIFFER(arg1, arg2) Check if arguments are different
-
- Succeeds and returns the null string if and only if arg1 and arg2
- are different. Strings and integers are different if they have
- unequal values. Other data types contain pointers to the actual
- data object, and differ only if the pointers are different. If
- arg2 is omitted, DIFFER succeeds if arg1 is not null.
-
- -----------------------------------------------------------------
-
- DUMP(i) Dump variables
-
- This function causes all natural variables with nonnull values to
- be listed on the file associated with I/O unit 6 (normally
- OUTPUT). If i is zero, the dump does not occur.
-
-
-
-
-
-
-
-
- Reference - 128 - Built-In Functions
-
-
-
-
-
- -----------------------------------------------------------------
-
- DUPL(s, i) Duplicate string
-
- Returns the argument string s repeated i times. The function
- returns the null string if i is zero, and fails if i is negative.
-
- -----------------------------------------------------------------
-
- ENDFILE(unit) Close file
-
- The file attached to the specified I/O unit is closed, and the
- file buffer is flushed and released. All variables which have
- been associated with this unit have their association removed.
- Upon program termination, SNOBOL4 will automatically perform an
- ENDFILE function on all open units.
-
- -----------------------------------------------------------------
-
- EQ(i1, i2) Equality test for numbers
-
- This function succeeds and returns the null string if the two
- integer arguments are equal. I1 and i2 must evaluate to integer
- values. The function fails if i1 is not equal to i2.
-
- -----------------------------------------------------------------
-
- EVAL(s or n) Compile and evaluate expression
-
- If the argument is a string, it should contain a valid SNOBOL4
- expression to be compiled and evaluated. The evaluation result
- is returned as the value of the function. EVAL fails and sets
- &ERRTEXT to an error message string if s contains a syntactic er-
- ror. If the argument is a number, i, it is returned unchanged.
- If the argument is an unevaluated expression, it is evaluated,
- and the result returned.
-
- -----------------------------------------------------------------
-
- FIELD(s, i) Get field name of defined data
- type
-
- Returns a string which is the Ith field name from the formal def-
- inition of the program-defined data type whose name is in string
- s. FIELD fails if i is greater than the number of fields in the
- data type's definition.
-
- -----------------------------------------------------------------
-
- GE(i1, i2) Greater than or equal test for
- numbers
-
- This function succeeds and returns the null string if the two
- integer arguments satisfy the relationship i1 >= i2. I1 and i2
-
-
-
- Reference - 129 - Built-In Functions
-
-
-
-
-
- must evaluate to integer values. The function fails if i1 is
- less than i2.
-
- -----------------------------------------------------------------
-
- GT(i1,i2) Greater than test for numbers
-
- This function succeeds and returns the null string if the two in-
- teger arguments satisfy the relationship i1 > i2. I1 and i2 must
- evaluate to integer values. The function fails if i1 is less
- than or equal to i2.
-
- -----------------------------------------------------------------
-
- IDENT(arg1, arg2) Check if arguments are identical
-
- Succeeds and returns the null string if and only if arg1 and arg2
- are identical. Strings and integers re identical if they have
- the same values. Other data types contain pointers to the actual
- data object, and are identical only if they point to the same
- object. If arg2 is omitted, IDENT succeeds if arg1 is the null
- string.
-
- -----------------------------------------------------------------
-
- INPUT(Name, Unit, i, s) Open file for input
-
- This function opens a file for input, and associates it with a
- variable. Data may then be read from the file by using the vari-
- able in an expression or an assignment statement.
-
- The file designated by string S is opened for input and given
- the specified unit number. I is an optional record length. The
- variable specified by Name is associated with this unit.
-
- The first argument, Name, specifies a SNOBOL4 variable, typi-
- cally as a quoted string or with the unary period operator:
-
- INPUT('IN', ...
- INPUT(.IN, ...
- X = 'IN'
- INPUT(X, ...
-
- The second argument, Unit, must evaluate to an integer value in
- the range 0 to 16 inclusive. Unit 0 (or omitting the unit argu-
- ment) will select the default input unit, 5.
-
- The third argument, I, contains the record length in charac-
- ters. 0 < I <= &MAXLNGTH. If omitted, the default is 80.
-
- The fourth argument, S, is a string containing the name of the
- file to be opened. If the file is a disk file, S may contain an
- optional drive letter and pathname in addition to the filename.
- Besides disk files, MS-DOS device names such as NUL:, CON:,
-
-
-
- Reference - 130 - Built-In Functions
-
-
-
-
-
- COM2:, etc., are permitted.
-
- If S is absent or null, and this unit is not currently open,
- the SNOBOL4 command line is searched for a file to use with this
- unit (/n:file). If S is absent, but the unit is already open,
- the INPUT call serves only to establish another association
- between a variable and the unit. If S is not null, any file pre-
- viously associated with this unit number is first closed by
- SNOBOL4 with an implicit ENDFILE(unit).
-
- An error message is generated for an illegal unit number. The
- INPUT function fails (with no printed error message) if the file
- cannot be opened.
-
- The record length I (or its default value, 80), determines the
- number of characters returned in a string when the associated
- variable is referenced. ASCII files will return I characters or
- less if an end-of-line condition is encountered. End-of-line is
- defined as either a carriage return, or a carriage return fol-
- lowed by a line feed. If I characters are read from an ASCII
- file without encountering an end-of-line, additional characters
- are read from the file and discarded until the end-of-line char-
- acter(s) are found. That is, long lines are truncated. If less
- than I characters are read from an ASCII file, and keyword &TRIM
- is zero, the line will be padded with blank characters until
- length i is obtained.
-
- A read operation will terminate on the last character of a disk
- file, returning a short record. Reading past the End-of-File
- will cause statement failure. If the file is ASCII, reading a
- control-Z character will be treated as an End-of-File.
-
- Note: When program begins execution, the variable INPUT is
- associated with unit 5. Unit 5 is normally device CON:, the key-
- board, unless redirected elsewhere by the /I=file command line
- option, or the MS-DOS redirection operation (<file).
-
- -----------------------------------------------------------------
-
- INTEGER(arg) Check if argument is an integer
-
- Succeeds and returns the null string if arg is an integer, or a
- string which can be converted to an integer. If the argument is
- not an integer, the function fails.
-
- -----------------------------------------------------------------
-
- ITEM(array, i1, i2, ..., in) Get array element
-
- ITEM(table, arg) Get table element
-
- Returns the specified array or table element. I1, i2, ..., in
- are array subscripts, and arg is a table subscript. Since the
- first argument may be a function which returns an array or table
-
-
-
- Reference - 131 - Built-In Functions
-
-
-
-
-
- name, it allows an indirect reference in situations that would
- not be syntactically valid. ITEM is an analog of the APPLY func-
- tion. For example, if F(X) is a program-defined function which
- returns an array name,
-
- ITEM(F(X), 20)
-
- references the 20th element of that array, whereas F(X)<20> is
- not acceptable.
-
- -----------------------------------------------------------------
-
- LE(i1, i2) Less than or equal test for
- numbers
-
- This function succeeds and returns the null string if the two
- integer arguments satisfy the relationship i1 <= i2. I1 and i2
- must evaluate to integer values. The function fails if i1 is
- greater than i2.
-
- -----------------------------------------------------------------
-
- LGT(s1, s2) Lexically greater than test for
- strings
-
- This function succeeds and returns the null string if s1 is lexi-
- cally greater than s2 (according to their alphabetic ordering).
- The two strings are compared left to right, character by charac-
- ter. If one string is exhausted before the other---with all
- characters equal---the longer string is lexically greater than
- the shorter string. The null string is lexically less than any
- other non-null string. If there is a character mismatch at the
- same position in both strings, the relationship between the char-
- acters determines the relationship of the strings. Strings are
- equal only if they are the same length, and are identical charac-
- ter by character.
-
- -----------------------------------------------------------------
-
- LOCAL(name, i) Get local variable name from
- function definition
-
- Returns a string which is the Ith local variable from the formal
- definition of program-defined function name. LOCAL fails if i is
- greater than the number of local variables in name's definition.
- LOCAL is useful when one function is used to trace another. The
- trace function can access the local variables used with the func-
- ion being traced with an indirect reference: $LOCAL(name, i).
-
-
-
-
-
-
-
-
-
- Reference - 132 - Built-In Functions
-
-
-
-
-
- -----------------------------------------------------------------
-
- LPAD(s1, i, s2) Pad left end of string
-
- This function is useful for right-justifying columnar output. It
- returns s1 padded on its left end until its total size is i char-
- acters. The pad character used is the first character of s2 if
- present, otherwise a blank is used if s2 is absent or null. If i
- is less than or equal to the length of s1, s1 is returned un-
- changed.
-
- -----------------------------------------------------------------
-
- LT(i1, i2) Less than test for numbers
-
- This function succeeds and returns the null string if the two in-
- teger arguments satisfy the relationship i1 < i2. I1 and i2 must
- evaluate to integer values. The function fails if i1 is greater
- than or equal to i2.
-
- -----------------------------------------------------------------
-
- NE(i1, i2) Not equal test for numbers
-
- This function succeeds and returns the null string if the two in-
- teger arguments are not equal. I1 and i2 must evaluate to inte-
- ger values. The function fails if i1 is equal to i2.
-
- -----------------------------------------------------------------
-
- OPSYN(s1, s2, i) Create operator synonym
-
- The function or operator name s1 becomes a synonym for s2. If i
- is absent or 0, both strings are assumed to be function names.
- If i is 1 or 2, then the strings are assumed to be unary or
- binary operators, respectively. Other values for i are illegal.
- Operators are specified by using their graphic symbol in a quoted
- literal, such as:
-
- OPSYN('#', '/', 2)
-
- The concatenation operator is specified as a one-character
- string containing a blank: ' '. The implicit pattern match oper-
- ator between subject and pattern cannot be OPSYNed.
-
- -----------------------------------------------------------------
-
- OUTPUT(name, unit, i, s) Open file for output
-
- This function opens a file for output, and associates it with a
- variable. Data may then be written to the file by assigning val-
- ues to the variable.
-
- The description of the OUTPUT function parallels that of the
-
-
-
- Reference - 133 - Built-In Functions
-
-
-
-
-
- INPUT function, and will not be duplicated here. The following
- differences are noted below.
-
- If the output file already exists, it is deleted and recreated
- anew. Facilities for updating existing files (direct-access
- files) are not present in Vanilla SNOBOL4; they are contained in
- SNOBOL4+, Catspaw's enhanced implementation of the SNOBOL4 lan-
- guage.
-
- When an output variable is assigned a string value, the string
- is written to the associated file. A carriage return and line
- feed appended to the string. If the string is longer than the
- record length (i, or the default, 80), a carriage return and line
- feed will be inserted every i characters. That is, long strings
- will create multiple output lines.
-
- Note: When a program begins execution, the variable OUTPUT is
- associated with unit 6. Unit 6 is normally device CON:, the dis-
- play, unless redirected elsewhere by the /O: command line option
- or the MS-DOS redirection operation (>file). The variable SCREEN
- is associated with unit 7, which is also attached to device CON:.
-
- -----------------------------------------------------------------
-
- PROTOTYPE(array) Get prototype which created an
- array
-
- Returns the prototype string of dimensions used to create the
- specified array. If the array was created by the ARRAY function,
- then the string returned is identical to the first argument of
- the original ARRAY function call. If the array was produced from
- a table by the CONVERT function, the string has the form 'N,2',
- where N is the integer number of rows in the array.
-
- -----------------------------------------------------------------
-
- REMDR(i1, i2) Get remainder after division
-
- REMDR returns the integer remainder resulting from i1 divided by
- i2, that is, i1 modulus i2. The result has the same sign as i1.
-
- -----------------------------------------------------------------
-
- REPLACE(s1, s2, s3) Replace characters in string
-
- This function returns s1 transformed according to a translation
- specified by s2 and s3. Each character of s1 found in s2 is re-
- placed by the corresponding character in s3. S2 and s3 must be
- the same length. If duplicate characters appear in s2, the
- rightmost one is used to obtain the mapping character from s3.
- Normally, s2 and s3 are thought of as parameters, and REPLACE
- performs character substitutions on the variable s1. For
- instance:
-
-
-
-
- Reference - 134 - Built-In Functions
-
-
-
-
-
- REPLACE(S, 'aeiouAEIOU', '1234512345')
-
- replaces all upper- and lower-case vowels in S with the digits 1
- through 5. It is possible to use REPLACE as a "transposition"
- function if s1 and s2 are considered parameters, and s3 allowed
- to vary. If s1 and s2 are the same length, a simple positional
- transformation results. For example,
-
- REPLACE('123456', '214365', S)
-
- returns the six character string S with adjacent pairs of charac-
- ters interchanged ('ABCDEF' becomes 'BADCFE'). S1 and s2 can be
- different lengths---only s2 and s3 must be the same size. If s2
- contains characters not in s1, the corresponding characters in s3
- are dropped from the result. If s1 contains characters not in
- s2, they will appear in the result. The function call
-
- REPLACE('Yy/Mm/Dd', 'Mm-Dd-Yy xx:xx:xx.xx', DATE())
-
- returns the date in the form YY/MM/DD (e.g., 87/07/28). Dupli-
- cate characters in s1 are permitted, so:
-
- REPLACE('aaabbbccc', 'abc' '(1)')
-
- produces '(((111)))'.
-
- -----------------------------------------------------------------
-
- RPAD(s1, i, s2) Pad right end of string
-
- This function is useful for left-justifying columnar output. It
- returns s1 padded on its right end until its total size is i
- characters. The pad character used is the first character of s2
- if present, otherwise a blank (ASCII character 32) is used if s2
- is absent or null. If i is less than or equal to the length of
- s1, s1 is returned unchanged.
-
- -----------------------------------------------------------------
-
- SIZE(s) Get length of string
-
- The function SIZE returns an integer value which is the number of
- characters in its argument string. A null string argument
- returns 0.
-
- -----------------------------------------------------------------
-
- STOPTR(name, type) Stop trace
-
- Discontinues the type of trace of the named item. Consult the
- TRACE() function for a list of tracing types available.
-
-
-
-
-
-
- Reference - 135 - Built-In Functions
-
-
-
-
-
- -----------------------------------------------------------------
-
- TABLE(i1, i2) Create a table
-
- A table is similar to a one-dimensional array, but the subscripts
- may be any SNOBOL4 data type. The TABLE function creates a table
- and returns a pointer to it. The integer i1 specifies the ini-
- tial number of entries in the table. Integer i2 specifies the
- size by which the table is increased whenever it becomes full,
- and additional table space is required. If either is omitted, 10
- is used as a default value.
-
- -----------------------------------------------------------------
-
- TIME() Get execution time
-
- Returns the number of tenths of a second elapsed since the start
- of program execution, including all I/O wait time.
-
- -----------------------------------------------------------------
-
- TRACE(name1, type, s, name2) Trace an entity
-
- The item name1 is traced according to the action specified by
- type. Trace output is written to the file associated with I/O
- unit 6.
-
- Name1 is a the name of a variable, function, statement label,
- or keyword. It may appear as a string, or specified with the
- unary name operator (.).
-
- Type is a string that determines the type of trace desired. It
- must be one of these values:
-
- 'VALUE' When value of name1 is changed (default if
- type omitted).
-
- 'CALL' When function name1 is called.
-
- 'RETURN' When function name1 returns.
-
- 'FUNCTION' When function name1 is called, or returns.
-
- 'LABEL' When control is transferred to label name1.
-
- 'KEYWORD' When the value of keyword &name1 is changed.
- Note that the ampersand character (&) is not
- included in the first argument, name1.
-
- S is an optional identifying tag that is added to the trace
- output line when name1 is a created object, such as an array or
- table element.
-
- Name2 is an optional name of a program-defined function.
-
-
-
- Reference - 136 - Built-In Functions
-
-
-
-
-
- Instead of producing a trace output line, this function is called
- when the trace action occurs. The function is called with name1
- as the first argument, and string s as the second argument.
-
- Tracing will only occur when the keyword &TRACE is nonzero.
- Each trace will decrement &TRACE by one. Tracing ends when it
- becomes zero.
-
- -----------------------------------------------------------------
-
- TRIM(s) Remove trailing blanks
-
- Returns the argument string with trailing blanks removed. Trail-
- ing tab characters are not affected. If the argument string was
- read from an input file, it is more efficient to set keyword
- &TRIM nonzero than to use TRIM(INPUT).
-
- By combining function TRIM with REPLACE, any trailing character
- can be removed. The desired character is temporarily exchanged
- with blank, trimmed, then exchanged back. For example, this
- expression returns string S with trailing zeros removed:
-
- REPLACE(TRIM(REPLACE(S,'0 ',' 0')),'0 ',' 0')
-
- -----------------------------------------------------------------
-
- UNLOAD(name) Remove function definition
-
- The function name becomes undefined.
-
- -----------------------------------------------------------------
-
- VALUE(name) Get value of an object
-
- The VALUE function returns the value of the variable name,
- behaving like the unary indirect operator ($).
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Reference - 137 - Built-In Functions
-
-
-
-
-
- Chapter 20
-
-
- SYSTEM MESSAGES
- -----------------------------------------------------------------
-
- This chapter lists all messages displayed by SNOBOL4.
-
-
- 20.1 INITIAL MESSAGES
-
- When SNOBOL4 begins execution, this title is displayed:
-
- Vanilla SNOBOL4 Version 2.14.
- (c) Copyright 1984,1988 Catspaw, Inc. All Rights Reserved.
-
- Additional messages which may appear:
-
- Cannot open file: name
- The file specified in the command line cannot be opened.
-
- Command line error:
- A syntactic error was detected in the SNOBOL4 command line. The
- command line is displayed on two lines. The line break shows
- where the error occurred.
-
- Errors detected in source program
- There were compilation errors in the source program. Execution
- will proceed until a statement with a compilation error is
- encountered.
-
- Insufficient storage for initialization
- Not enough memory was available to initialize the SNOBOL4 system.
-
- No errors
- Compilation is complete, and without error. Execution begins
- immediately.
-
-
- 20.2 TERMINATION MESSAGES
-
- Termination messages are normally produced on I/O unit 7, which
- defaults to the user's display screen. If the /B option was used
- in the invoking command line, they are produced on I/O unit 6,
- associated with variable OUTPUT. Dump messages are always pro-
- duced to unit 6.
-
- Normal termination at level LL
- The program transferred to the label END. LL is the current
- program-defined function call depth. This message is produced
- only if the /S command line option (statistics) was used.
-
-
-
-
-
-
- Reference - 138 - System Messages
-
-
-
-
-
- filename(XXX) : Last statement executed was NNN
- NNN is the statement number of the last statement executed, XXX
- is its source line number. It is the statement that transferred
- to the END statement. If this was a normal termination, it is
- only displayed if the /S option was used.
-
- filename(XXX) : Warning: Interrupted in statement NNN at level LL
- Execution was interrupted when you pressed the BREAK or control-C
- key. The interruption occurred before the specified statement
- was executed. LL is the current call depth of program-defined
- functions.
-
- Incomplete storage regeneration. Terminal dump not possible
- Stack overflow occurred during storage regeneration, and the
- &DUMP keyword was nonzero. Memory is in an indeterminate form,
- and a dump listing cannot be produced.
-
- Dump of variables at termination
- Natural variables
- Unprotected keywords
- These headings will appear if a termination dump was requested by
- setting the &DUMP keyword nonzero. Variables are listed only if
- they contain a nonnull value. The variable names will be sorted
- if the &DUMP keyword is positive; they are unsorted if it is
- negative.
-
-
- 20.2.1 Job Statistics
-
- End-of-run statistics on program execution are provided if the
- /S command line option is used. Compilation and execution times
- are in tenths of a second. Times are wall-clock values, and
- include all I/O wait time, such as delays for keyboard input:
-
- SNOBOL4 statistics summary-
- NN tenths of a second compilation time
- NN tenths of a second execution time
- NN statements executed, NN failed
- NN arithmetic operations performed
- NN pattern matches performed
- NN regenerations of dynamic storage
- NN reads performed
- NN writes performed
-
-
- 20.3 COMPILATION MESSAGES
-
- SNOBOL4 syntax errors are detected during compilation. State-
- ment compilation ceases at the point where the error was de-
- tected. The error message contains a marker which indicates the
- valid portion of the statement accepted by the compiler---the
- error occurred after this point. Only the first error in a
- statement is detected. The erroneous statement is compiled with
- an internal error code which produces an error message if the
-
-
-
- Reference - 139 - System Messages
-
-
-
-
-
- statement is executed. Compilation resumes with the next state-
- ment. Compilation ceases and SNOBOL4 terminates if more than 50
- errors are found.
-
- When compiling without a list file (/L: command line option),
- the compiler will attempt to display the erroneous line on your
- screen. If a statement is continued over several lines, only the
- line in error is displayed. Several errors cannot be detected
- until the absolute end-of-statement is found. This may require
- reading the next line, and finding it is NOT a continuation
- statement. In this case, the single line displayed will be the
- NEXT line, with the error marker in the first character position.
-
- The CODE function may be used to compile SNOBOL4 statements
- that have been concatenated into a long string. The CODE func-
- tion fails if a syntax error is found, and the keyword &ERRTEXT
- contains the error message string for the error encountered.
-
- Binary operators must be surrounded by blanks
- Omitting a blank will often cause this error. An illegal or
- undefined binary operator will also produce this error.
-
- Error in GOTO
- There is a syntactic error in the GOTO field.
-
- Erroneous END statement
- The END statement contains a syntactic error, or the label speci-
- fied in the subject field for initial transfer could not be
- found.
-
- Erroneous integer
- An integer number appears which is too large for the SNOBOL4 sys-
- tem. The allowable range for magnitude values is 0 to 32767.
-
- Erroneous label
- The first character of a statement must be blank, tab, alphanu-
- meric, * (comment), + or . (continuation), or - (control).
-
- Erroneous or missing break character
- A character which separates language elements occurs in an ille-
- gal context, or an expression is not balanced with respect to
- parentheses.
-
- Erroneous subject
- A compiler break character appears before the statement subject
- field. The break characters are comma, equal sign, right paren-
- thesis, right square bracket (]), and right angular bracket (>).
-
- Illegal character in element
- A character was found which was incorrect for the type of lan-
- guage object being compiled. This often occurs when a blank is
- omitted between elements, causing them to run together.
-
-
-
-
-
- Reference - 140 - System Messages
-
-
-
-
-
- Improperly terminated statement
- The source statement terminated with an incomplete language con-
- struction.
-
- Limit on compilation errors exceeded
- More than 50 compilation errors were found in the source program.
-
- No END statement in source file
- End-of-File was encountered in the source file without an END
- statement.
-
- Previously defined label
- A duplicate label appears. The first definition is retained;
- subsequent definitions are discarded.
-
- Unclosed literal
- The closing quotation mark from a literal string is missing.
- This error also occurs if the closing quotation mark (single or
- double) was different from the opening mark.
-
-
- 20.4 EXECUTION ERROR MESSAGES
-
- Most program logic errors can only be detected during program
- execution. Some are unconditionally fatal, and cause the SNOBOL4
- system to terminate. Others are conditionally fatal---the system
- terminates if the value of the keyword &ERRLIMIT is zero. If
- &ERRLIMIT is nonzero, the keyword &ERRTYPE is set to the error
- message number, &ERRTEXT is set to the message text, &ERRLIMIT is
- decremented, and execution continues.
-
- The protected keyword &ERRTYPE may be traced, permitting a
- program-defined function to gain control when a conditional error
- occurs. THe program CODE.SNO provides an example of how to do
- this. The initial value of the unprotected keyword &ERRLIMIT is
- zero, forcing program termination upon any error.
-
- Errors 1-16 are conditionally fatal. Errors 17-28 are uncondi-
- tionally fatal. When execution terminates due to an error, the
- following is displayed:
-
- filename(XXX) : Error NN, -- description --
- In statement NNN, at level LL
-
- NN is the error number below. NNN is the statement number
- assigned in the compiler list file, XXX is the absolute line
- number in the source file. LL specifies the current program-
- defined function call depth (0 is the normal main-program level).
-
- 1. Illegal data type
- The data type of an operand was incorrect for the type of opera-
- tion attempted. This occurs most frequently with arithmetic op-
- erations, when one operand is a string which cannot be converted
- to a number.
-
-
-
- Reference - 141 - System Messages
-
-
-
-
-
- 2. Error in arithmetic operation
- An arithmetic operation upon integer values produced a result
- which was out of range, or was undefined, such as division by
- zero.
-
- 3. Erroneous array or table reference
- An array or table reference was made to a variable which did not
- contain an array or table pointer.
-
- 4. Null string in illegal context
- The null string appeared where it is not permitted, such as the
- object of an indirect reference.
-
- 5. Undefined function or operation
- A function was called before it was defined, or an undefined
- operator was used.
-
- 6. Erroneous prototype
- A syntactic error occurred in the prototype string used with the
- functions ARRAY, DATA or DEFINE. Note that the blank and tab
- characters are not permitted within the prototype string.
-
- 7. Unknown keyword
- The keyword specified is unknown to the SNOBOL4 system.
-
- 8. Variable not present where required
- A variable name must be used as the subject of an assignment
- statement, or as the argument of the unary cursor, name, or key-
- word operator (@, ., &), or the binary pattern match assignment
- operators (., $).
-
- 9. Entry point of function not label
- At the time a program-defined function was first called, its
- entry point label did not appear as the label of any SNOBOL4
- statement.
-
- 10. Illegal argument to primitive function
- An illegal value was used as an argument to the function ARG,
- FIELD, LOCAL, OPSYN, STOPTR, or TRACE, or an illegal value was
- specified in the third argument to INPUT or OUTPUT.
-
- 11. Reading error
- An error condition was returned when reading from a file.
-
- 12. Illegal I/O unit
- Allowable unit numbers are 1 through 16 (inclusive). (Unit 0 is
- allowed in functions INPUT and OUTPUT, and is converted to units
- 5 and 6 respectively.)
-
- 13. Limit on defined data types exceeded
- SNOBOL4 allows 899 different program-defined data types.
-
-
-
-
-
-
- Reference - 142 - System Messages
-
-
-
-
-
- 14. Negative number in illegal context
- A negative number was used incorrectly as the argument of the
- function LEN, POS, TAB, or RTAB.
-
- 15. String overflow
- The program attempted to create a string larger than &MAXLNGTH
- characters.
-
- 16. Overflow during pattern matching
- The internal SNOBOL4 stack overflowed during pattern matching.
- This can happen when a recursive or looping pattern is incor-
- rectly specified.
-
- 17. Error in SNOBOL4 system
- This message indicates an internal SNOBOL4 system error.
-
- 18. Return from level zero
- An attempt was made to transfer to the function return label
- RETURN, FRETURN, or NRETURN outside of any function call.
-
- 19. Failure during GOTO evaluation
- The expression used for an indirect transfer within the GOTO
- field failed when evaluated.
-
- 20. Insufficient storage to continue
- All available memory has been used. Vanilla SNOBOL4 is limited
- to 30K bytes for program and data. SNOBOL4+, Catspaw's enhanced
- version, allocates 300K bytes for program and data.
-
- 21. Stack overflow
- The SNOBOL4 internal stack has overflowed. This may be caused by
- excessive function recursion, or occur during memory garbage
- collection.
-
- 22. Limit on statement execution exceeded
- The number of statements executed was greater than the value in
- the keyword &STLIMIT. &STLIMIT is initially -1, specifying
- unlimited execution.
-
- 23. Object exceeds size limit
- The program attempted to create an object larger than the maximum
- size allowed.
-
- 24. Undefined or erroneous GOTO
- A transfer was attempted to an undefined label, or an expression
- in a GOTO field evaluated to a string, rather than a label
- name---usually the result of omitting the indirect operator ($).
-
- 25. Incorrect number of arguments
- A primitive function was called with too many arguments.
-
- 28. Execution of statement with compilation error
- Execution proceeded to a statement that contained a compilation
- error.
-
-
-
- Reference - 143 - System Messages
-
-
-
-
-
- 20.5 EXECUTION TRACE MESSAGES
-
- Tracing is provided for variables, certain keywords, label
- transfers, and function calls and returns. A trace message is
- output to I/O unit 6 for each trace occurrence. Program execu-
- tion time, in tenths of a second, is appended to each message.
-
- Tracing normally occurs only if the keyword &TRACE is nonzero.
- However, another keyword, &FTRACE, may be set nonzero to trace
- all function calls and returns independently of keyword &TRACE.
-
- STATEMENT NN: <vname> = <value>,TIME = TT
- Value trace; produced by the function call TRACE('vname',
- 'VALUE'), where vname is the name of the variable to be traced.
-
- STATEMENT NN: &<keyname> = <value>,TIME = TT
- Keyword trace; produced by the function call TRACE('keyname',
- 'KEYWORD'), where keyname is the upper case keyword name, without
- the leading ampersand.
-
- STATEMENT NN: TRANSFER TO <labname>,TIME = TT
- Label trace; produced by the function call TRACE('labname',
- 'LABEL'), where labname is the desired label name. Tracing only
- occurs on a transfer of control; it does not occur if the labeled
- statement is flowed into.
-
- STATEMENT NN: LEVEL LL CALL OF <fname>(arg1,...,argn),TIME = TT
- Call trace; produced by the function call TRACE('fname', 'CALL'),
- where fname is the name of the function to be traced. The func-
- tion's arguments at the time of the call are evaluated and dis-
- played.
-
- STATEMENT NN: LEVEL LL RETURN OF <fname> = <value>,TIME = TT
- STATEMENT NN: LEVEL LL NRETURN OF <fname> = <value>,TIME = TT
- STATEMENT NN: LEVEL LL FRETURN OF <fname>,TIME = TT
- Return trace; produced by the function call TRACE('fname',
- 'RETURN'), where fname is the name of the function whose return
- is to be traced. The type of return that occurred is displayed
- in the trace message.
-
- ***Print request too long***
- An internal buffer is used to display trace messages, and vari-
- able values during dumps. If the required display is longer than
- 1,800 characters, this error message is produced instead.
-
-
-
-
-
-
-
-
-
-
-
-
-
- Reference - 144 - System Messages
-