Overview: Encoded Streams
Bojangles supports a variety of encoded data streams for representing
program data. This note is an overview of the various encoding schemes
and the software structure provided to support it.
Delimited streams
The basic Bojangles interface to external data representation is via
EDStreams. These streams allow high-level operations like readDouble,
or writeManaged. This requires that these streams be able to parse their
contents into separate values. In addition, these streams support
sub-streams offset with begin and end indications. Thus, the EDStreams
must also represent the begin and end points of sub-streams. The
general scheme taken for encoded data streams is to use special
characters that cannot occur in data values (unless they are escaped)
to separate values and to mark the begin and end of sub-streams. A
pseudo-BNF grammar for these streams is:
stream = dtoken EOF
dtoken = token separator | begin | end
begin = ; a single specified character
end = ; a single specified character
separator = ; a single specified character
token = ; any sequence of characters including escaped
; versions of the begin, end, and separator
; characters represented in a stream specific
; encoding.
For example: assuming that the begin, end, and separator characters
are "{", "}", ";" respectively. Then the following are possible
encoded representations of data.
23; 42 ; -209.67865 ; A String ; true ; x; EOF
Representing the values:
int 23
int 42
double -209.67865
String "A String"
boolean true
char 'x'
And,
23 ; { 42; 75 ; 10 ; } 45; EOF
Represents a stream with a sub-stream containing: 42, 75, and 10.
Encoding of Object data
The MOFW EDStreams support reading and writing objects as well as
simple values. The encoding of objects is a little more complex than
other values. In addition, EDStreams support two ways of reading and
writing objects, via special methods (e.g., writeString,
readManageable) and via generic methods (readObject,
writeObject). The two types of methods product the same encodeings
except that the generic encodings also prepend a type code (as shown
below) before each value.
There are four cases that are supported, each has an
encoding that consists of one or more separate values as follows (only
the generic encodings are show, the encodings produced by the specific
methods are the same without the leading type code):
ASCII Encoding
ASCII encoded streams are restricted to the printable 7-bit ASCII
values (32-126 inclusive) plus newline and return. Values other than
strings and characters are represented as they would be in Java source
code. E.g., booleans are represented by "true" or "false", ints are
represented as base 10 numbers, etc.
In strings and character values a set of escapes are allowed to
encode any UNICODE character value. The supported escapes are:
\n newline
\r return
\f form feed
\t horizontal tab
\uxxxx (where xxxx is four hex digits) for any UNICODE character.
In addition, any printable ASCII character can be escaped to prevent
it from being recognized as a delimiter (see the discussion above).
For example:
23; \ This string starts with a space and contains a \;. ; \u7865 EOF
represents a stream with three values, 23,
" This string starts with a space and contains a ;.", and the UNICODE
character with hex value 7865.
Using the ASCII EDStreams
These streams are ASCIIInputEDStream and ASCIIOutputEDStream.
Four constructors are provided for both input and output ASCII
streams. Basicly these constructors support the the four combinations
of two choices: (1) what should be used as the source/sink for
characters, and (2) use default encodings or define delimiters
explicitly.
The first choice allows you to use any object derived from
java.io.InputStream or java.io.OutputStream. Or to provide a class
that supports the CharConsumer or CharProducer interfaces which are
very simple get/put-next-character style interfaces. Java provides a
wide range of classes that are derived from InputStream or
OutputStream so usually this is all you need.
The second choice is between as set of default delimiters (";",
"{", "}") and a set of delimiters of your own choosing. You can
also choose to allow leading and trailing white space around values if
you want. The default allows white space.
See the Javadoc API documentation for details of the constructor
interfaces.
Class Structure of the Encoded Stream Framework
The general concept of the data flow for the framework is illustrated
below.
Input
Simple Character Producer
|
v
Decoding Classes
|
v
StringInputEDStream subclass that defines an encoding
|
v
Your Application
Output
Simple Character Consumer
^
|
Encoding Classes
^
|
StringOutputEDStream subclass that defines an encoding
^
|
Your Application
A summary of the framework classes in data flow order (from
character production to character consumption) follows.