*

Overview: Encoded Streams

Bojangles supports a variety of encoded data streams for representing program data. This note is an overview of the various encoding schemes and the software structure provided to support it.

Delimited streams

The basic Bojangles interface to external data representation is via EDStreams. These streams allow high-level operations like readDouble, or writeManaged. This requires that these streams be able to parse their contents into separate values. In addition, these streams support sub-streams offset with begin and end indications. Thus, the EDStreams must also represent the begin and end points of sub-streams. The general scheme taken for encoded data streams is to use special characters that cannot occur in data values (unless they are escaped) to separate values and to mark the begin and end of sub-streams. A pseudo-BNF grammar for these streams is:
   stream          = dtoken EOF
   dtoken          = token separator | begin | end 
   begin           = ; a single specified character
   end             = ; a single specified character
   separator       = ; a single specified character
   token =           ; any sequence of characters including escaped
                     ; versions of the begin, end, and separator
		     ; characters represented in a stream specific
		     ; encoding. 
For example: assuming that the begin, end, and separator characters are "{", "}", ";" respectively. Then the following are possible encoded representations of data.

     23; 42 ; -209.67865 ; A String ; true ; x; EOF

Representing the values:

     int       23
     int       42
     double    -209.67865
     String    "A String"
     boolean   true
     char      'x'

And,

     23 ; { 42;  75   ; 10 ; } 45; EOF

Represents a stream with a sub-stream containing: 42, 75, and 10.

Encoding of Object data

The MOFW EDStreams support reading and writing objects as well as simple values. The encoding of objects is a little more complex than other values. In addition, EDStreams support two ways of reading and writing objects, via special methods (e.g., writeString, readManageable) and via generic methods (readObject, writeObject). The two types of methods product the same encodeings except that the generic encodings also prepend a type code (as shown below) before each value.

There are four cases that are supported, each has an encoding that consists of one or more separate values as follows (only the generic encodings are show, the encodings produced by the specific methods are the same without the leading type code):

  • null: null object values are written a single character 'n'. (Note: in the generic encoding all null objects are written this way, only the specific methods can produce type the specific encodings of null shown below.)
  • Manageable: written as the character 'e', followed by the name of the object's class, followed by the externalization of the object's value. For example, a simple manageable object with class "COM.ibm.foo" and two integer values in its essential data would be externalized like this: e; COM.ibm.foo; 3; 8; A null Manageable (produced by writeManageable) is encoded with like: this: null;
  • Managed: written as the character 'm', followed by the name of the object's class, followed by the id of the object. For example: m; COM.ibm.fooMO; /bar/baz/xxx;
  • String: written as the character 's', followed by the string's value. For example: s; This is a string;

    A null string (written by writeString) is encoded like this:

    <<NULL>> This is a minor limitation that will be removed in future releases. For now "<<NULL>>" is a reserved string value and cannot be written correctly (it will look like a null string).
  • ASCII Encoding

    ASCII encoded streams are restricted to the printable 7-bit ASCII values (32-126 inclusive) plus newline and return. Values other than strings and characters are represented as they would be in Java source code. E.g., booleans are represented by "true" or "false", ints are represented as base 10 numbers, etc.

    In strings and character values a set of escapes are allowed to encode any UNICODE character value. The supported escapes are:

    
       \n  newline
       \r  return
       \f  form feed
       \t  horizontal tab
       \uxxxx (where xxxx is four hex digits) for any UNICODE character.
    
    
    In addition, any printable ASCII character can be escaped to prevent it from being recognized as a delimiter (see the discussion above).

    For example:

    
       23; \ This string starts with a space and contains a \;. ; \u7865 EOF
    
    
    represents a stream with three values, 23, " This string starts with a space and contains a ;.", and the UNICODE character with hex value 7865.

    Using the ASCII EDStreams

    These streams are ASCIIInputEDStream and ASCIIOutputEDStream.

    Four constructors are provided for both input and output ASCII streams. Basicly these constructors support the the four combinations of two choices: (1) what should be used as the source/sink for characters, and (2) use default encodings or define delimiters explicitly.

    The first choice allows you to use any object derived from java.io.InputStream or java.io.OutputStream. Or to provide a class that supports the CharConsumer or CharProducer interfaces which are very simple get/put-next-character style interfaces. Java provides a wide range of classes that are derived from InputStream or OutputStream so usually this is all you need.

    The second choice is between as set of default delimiters (";", "{", "}") and a set of delimiters of your own choosing. You can also choose to allow leading and trailing white space around values if you want. The default allows white space.

    See the Javadoc API documentation for details of the constructor interfaces.

    Class Structure of the Encoded Stream Framework

    The general concept of the data flow for the framework is illustrated below.

    Input

    		 Simple Character Producer
    			    |
    			    v
    		      Decoding Classes
    			    |
    			    v
          StringInputEDStream subclass that defines an encoding
    			    |
    			    v
    		      Your Application
    
    

    Output

    		 Simple Character Consumer
    			    ^
    			    |
    		      Encoding Classes
    			    ^
    			    |
          StringOutputEDStream subclass that defines an encoding
    			    ^
    			    |
    		      Your Application
    
    

    A summary of the framework classes in data flow order (from character production to character consumption) follows.

  • CharProducer (I) Produces characters one at a time. These is the base interface for character sources, all up-stream classes only require objects that support this interface.
  • Methods:
  • int read();
  • void putBack(char c);
  • Implemented by:
  • StringCharProducer (C)
  • StringBufferCharProducer (C)
  • InputStreamCharProducer (C)
  • ASCIIDecoder (C)
  • URLDecoder (C) pending
  • Tokenizer (C) Produces strings from a CharProducer source based on delimiters. Allows escapes.
  • Methods:
  • int read(StringBuffer buf, boolean trim);
  • static void trim(StringBuffer buf, int startPos);
  • StringProducer (AC) Produces strings from a string or character source. Uses Tokenizer to parse the sources if necessary. Adds support for atEnd, atEOF, readBegin, readEnd.
  • Methods:
  • boolean atEnd();
  • boolean atEOF();
  • String read();
  • void read(StringBuffer buf);
  • void readBegin();
  • void readEnd();
  • Implemented by:
  • CharStringProducer (C)
  • StringDecoder (AC)
  • ASCIIStringDecoder (C)
  • URLStringDecoder (C) pending
  • StringInputEDStream (C) Implements a InputEDStream based on a string source.
  • Implemented by:
  • ASCIIInputEDStream (C)
  • URLInputEDStream (C) pending
  • StringOutputEDStream (C) Implements a OutputEDStream based on a string consumer.
  • Implemented by:
  • ASCIIOutputEDStream (C)
  • URLInputEDStream (C) pending
  • StringConsumer (AC) Consumes strings, usually converting them into a stream of characters with delimiters. Adds support for writeBegin, writeEnd.
  • Methods:
  • void write(String str)
  • void write(StringBuffer buf)
  • void write(StringBuffer buf, int startPos, int endPos)
  • void writeBegin()
  • void writeEnd()
  • Implemented by:
  • ASCIIStringConsumer (C)
  • URLStringConsumer (C) pending
  • CharConsumer (I) Consumes characters one at a time.
  • Methods:
  • void write(int c)
  • void write(CharProducer in)
  • Implemented by:
  • StringBufferCharConsumer (C)
  • OutputStreamCharConsumer (C)
  • ASCIIEncoder (C)
  • URLEncoder (C) pending