Update v2.0.1: FileFlex International WorldFlex Functions
FileFlex is used within multimedia productions throughout the world. While
standard ASCII is prevalent, it is certainly not ubiquitous. When dealing
with international languages, it's necessary to account for differences
in character sorting order, for differences in case conversion, for differences
in character values, and for double-byte characters.
FileFlex new WorldFlex technology now gives you the ability to build international
flexibility into your applications with unprecedented power. FileFlex' WorldFlex
technology gives you true dynamic localization. Unlike virtually all other
so-called "world-aware" implementations, you're not forced to
rely on a particular operating system revision or a country-nationalized
version of an application. FileFlex allows you to define your own international
conversion tables and apply them on-the-fly to any data management task.
This dynamic localization functionality allows you to switch languages,
character sets, sort orders, and conversions at any time throughout the
operation of your multimedia production instantly, with virtually no impact
upon FileFlex' already blazing performance.
FileFlex WorldFlex' technology falls into these three broad categories:
- Dynamic character-level sort order: FileFlex allows you to
use indexes and queries that dynamically switch between sort-order tables.
Finally, an accented "a" character is treated like a regular "a",
rather than something from Mars. Sort orders can be specified for either
single-byte or double-byte languages.
- Character translation: As many FileFlex users have discovered,
the special diacritical characters have different values between Macintosh
and Windows, and even between DOS and Windows. FileFlex allows you to convert
characters so that all the diacritical marks (and any other conversions
you may need) are all in the right places and your characters look just
right.
- Case conversion: Normal case conversion routines apply a simple
heuristic to determine the upper case value of a character. Converting an
"a" to an "A" is simply the matter of subtracting 32.
But what about converting a "u" with an umlaut to an upper case
value? What about converting vowels with accents to their equivalent upper
case characters? FileFlex provides two standalone functions that allow you
to use custom case conversion tables so that your case conversions make
sense in your language. FileFlex internal intrinsic index and query functions
also take into account custom case conversion tables so your data can be
case insensitive when desired (as opposed to case insane).
Before we proceed with details of these functions, we'd like to thank our
customers throughout the world for working with us to understand the individual
needs of different languages and customs and how those needs apply to the
authoring of multimedia productions worldwide.
Understanding Character-Level Sort Order
Note: The character-level sorting features in FileFlex require
that you have a measurable amount of programming expertise. These features
let you modify the very core of FileFlex data management and require both
care to use and experience to understand. If you're not a pretty advanced
scripter or programmer, you may want to find an experienced "buddy"to team up with before attempting to utilize these powerful capabilities.
FileFlex uses index files to sort information. When you create an index
file, you're choosing a field that will determine the sort order of the
database. For example, you might choose to sort on zipcode (a numeric code
in the US that helps the post office tell where to deliver mail--in other
countries this is often called the postal code), meaning that records containing
08553 in the zipcode field will be earlier in the database than records
with 94404 in the zipcode field. Likewise, if you chose to organize your
data based on last name, then "Clinton" would come before "Kennedy".
When you switch indexes, FileFlex doesn't reorder the entire database of
records. Instead it adopts a different sort order based on the data in the
fields. FileFlex creates the order of information in an index file when
DBCreateIndex is called. It maintains and updates that order of information
as part of the process of writing a record.
When FileFlex updates an index file, it's comparing the values in two different
records. When it looks at "Clinton" and "Kennedy", it
looks at the first characters (i.e., "C" and "K") and
determines that "C" comes before "K" and therefore "Clinton"comes before "Kennedy".
This comparison of "C" vs. "K" is based on the standard
ordered table we call ASCII (American Standard Code for Information Interchange).
When FileFlex compares "C" against "K", it's really
getting the ASCII value of "C" (67 decimal) and comparing it to
the ASCII value of "K" (75 decimal). Since 67 comes before 75,
then "C" comes before "K".
Note: Character sorting is case sensitive. A lower case "c"is ASCII 99 while an upper case "C" is ASCII "67". If
you were to compare "clinton" (note the lower case "c")
against "Kennedy", "Kennedy" would come first because
of the ASCII value of "K" (ASCII 75) is less than that of lower
case "c".
So, when FileFlex looks at "CLINTON" and "KENNEDY",
it's really looking at the comparative weights (or priorities) of the individual
characters, according to their representation in ASCII. Here's the two strings
and their corresponding values:
C L I N T O N
67 76 73 78 84 79 78
| | | | | | |
75 69 78 78 69 68 89
K E N N E D Y
Custom Character Sort Orders
FileFlex' new WorldFlex technology allows you to customize the character-level
sort order used by the FileFlex indexing routines. There are two primary
reasons you might want to do this:
- To sort in descending rather than ascending order
- To sort according to sorting rules different than ASCII, in particular
for languages other than English.
In fact, a very important part of WorldFlex technology is the ability to
change the sort order of your characters, and thereby sort your database
according to the sorting rules you feel are currently appropriate.
Many so-called "internationalized", "localized", or
"world-aware" systems do provide support for character sorting
order for multi-country use. But they are usually available only when you're
running the localized version of the operating system or database application.
While many of orur friends outside the US are grateful for any mechanism
that recognizes their native language, FileFlex doesn't stop there. FileFlex'
new WorldFlex technology is vastly more powerful. FileFlex allows you to
change your sorting order on-the-fly, as you switch index files.
Nothing else can do this!
Here's an example of where this is so powerful: Imagine you're a multi-national
firm with customers throughout the world. When you do a query to list your
customers in the US, the ASCII sort order is just fine. But when you do
a query to list customers in Japan, you want the customers' names sorted
by the appropriate sorting conventions for the Japanese language and character
sets--not according to the rather provincial expectations of ASCII. With
FileFlex, you can switch from an ASCII index to an index ordered according
to Japanese sort order absolutely instantly.
Creating a Single-Byte Custom Sort Order
Table
Character sort orders are controlled by a custom sort order table. For applications
and languages that use single-byte characters (typically, "roman"languages), each character can be represented by a single byte. Since a
byte is 8-bits wide, this allows for 256 characters.
You create a sort order table in your host development environment's programming
language (our examples will be in Director's Lingo). We do this by building
a table containing three bytes of data for each character in the sort order:
- Leader Flag Byte: For single-byte languages, this byte is always
set to 255.
- Priority Multiplier Byte: For single-byte languages, this byte
is also always set to 255.
- Priority Value Byte: This value signifies the priority of the
character in the list (never use 0).
At the end of all of the three-byte sets, a single byte containing the value
0 is used to terminate the table.
Before we look in more detail at the Priority Value Byte, let's first look
at how ASCII prioritizes it's characters:
A B C D E ... V W X Y Z
65 66 67 68 69 ... 86 87 88 89 90
Since "A" is an ASCII 65, it's got a lower value than "D",
which is an ASCII 68. The numbers 65 and 68 correspond to the priority value
of the various letters. Likewise, in FileFlex' custom sort order tables,
the lower priority value number, the earlier in the sort the character will
be placed. If we wanted to sort in reverse order ("Z" before "A"),
we could assign different priority values, giving "Z" a much lower
number than "A", as in the following list:
Z Y X W V ... E D C B A
65 66 67 68 69 ... 86 87 88 89 90
With the priorities show above, if we looked up a "D", we'd see
it's value was 87. Since an "A" has a priority value of 90, the
"D" would come earlier in the list. If we used this set of priority
values, "KENNEDY" would certainly appear before "CLINTON".
It's important to remember that the priority value is entirely up to you.
If you wanted all words with vowels (A, E, I, O, and U) to come at the beginning
of the list, you might create the following table of priority values:
A E I O U B C D F G H J K L
65 66 67 68 69 70 71 72 73 74 75 76 77 78 ...
FileFlex determines where in the sort order table to find a priority value
based on the character's actual computer-code value (usually ASCII). So,
since "A" has the ASCII code value of 65, FileFlex will look in
the 65th entry in the sort order table to retrieve the priority value. Let's
make this a bit clearer by constructing a partial sort order table for traditional
ASCII (note, we're showing all three data bytes as described above and all
numbers are in base-10):
Entry Pos 65 66 67 68
US Char "A" "B" "C" "D"
Data Bytes 255 255 065 255 255 066 255 255 067 255 255 068
Entry Pos 69 70 71 72
US Char "E" "F" "G" "H"
Data Bytes 255 255 069 255 255 070 255 255 071 255 255 072
Entry Pos 73 74 75 76
US Char "I" "J" "K" "L"
Data Bytes 255 255 073 255 255 074 255 255 075 255 255 076
So, to create a FileFlex sort order table that matches traditional ASCII
in ascending order, you'd want "A" to have a sort order priority
of 65, so the third data type at position 65 would be the value 65.
Now let's look at how the table would change if we wanted to sort everything
in reverse order (note that we've reversed the entire ASCII character
set):
Entry Pos 65 66 67 68
US Char "A" "B" "C" "D"
Data Bytes 255 255 190 255 255 189 255 255 188 255 255 187
Entry Pos 69 70 71 72
US Char "E" "F" "G" "H"
Data Bytes 255 255 186 255 255 185 255 255 184 255 255 183
Entry Pos 73 74 75 76
US Char "I" "J" "K" "L"
Data Bytes 255 255 182 255 255 181 255 255 180 255 255 179
Using the above table, when FileFlex encounters the character "A",
which has the ASCII value of 65, it looks at the 65th entry in the table.
It then retrieves the priority value, which is 190. If FileFlex then looks
for "C" (in the 67th entry in the table), it retrieves the priority
value of 188. Since 188 is less than 190, FileFlex will put "C"before "A".
Creating Single-Byte Sort Order Utility
Scripts
The best way to create the sort order table is to write a simple utility
script. Here's an example script that simply builds the ASCII order in ASCII
order:
on buildSortOrder_ASCII
global ASCII
put "" into theTable
repeat with i = 0 to 255
put the number of chars of theTable into theChar
put numToChar(255) after theTable -- no leader char
put numToChar(255) after theTable -- priority multiplier of 0
if i = 0 then
put numToChar(255) after theTable -- use 255 in byte 0
else
put numToChar(i) after theTable -- priority value
end if
end repeat
put numToChar(0) after theTable -- terminator byte code
put theTable into ASCII
end buildSortOrder_ASCII
Note the name of the handler is "BuildSortOrder_ASCII". We've
developed a convention where the routine that builds the sort order is called
"BuildSortOrder_" and the name of the sort order itself is appended
to the end. The sort order table is placed in a global variable of the same
name. So, for a sort order for French Canadian, we recommend naming the
handler "BuildSortOrder_FrenchCanadian" and the global variable
containing the sort order "FrenchCanadian".
Note that the routine above places the actual byte value into the string
by using numToChar(x). This places a single byte value corresponding to
the number in the string location. Each set of data bytes in the table gets
two bytes with 255 (for the leader char and priority page 0), and the byte
corresponding to the priority value. Finally, after all the data byte sets
are added to the string, BuildSortOrder_ASCII appends a terminator byte
(value 0).
Here's an example routine that reverses the ASCII sort order, placing the
table in the global ASCIIReverse:
on buildSortOrder_ASCIIReverse
global ASCIIReverse
put "" into theTable
put 255 into priority
repeat with i = 0 to 255
put the number of chars of theTable into theChar
put numToChar(255) after theTable -- no leader char
put numToChar(255) after theTable -- priority multiplier of 0
if i = 0 then
put numToChar(255) after theTable -- use 255 in byte 0
else
put numToChar(priority) after theTable -- priority value
end if
put priority-1 into priority
end repeat
put numToChar(0) after theTable -- terminator byte code
put theTable into ASCIIReverse
end buildSortOrder_ASCIIReverse
WARNING: Make absolutely certain you end each sequence with a
numToChar(0) terminator byte. Failure to do this could cause FileFlex to
scan beyond the end of the sort order table and the results could be unpredictable
and your program could abnormally terminate.
Understanding Double-Byte Sort Order Tables
If the language you're sorting uses double-byte characters (like certain
Japanese and Chinese character sets), you'll need to create double-byte
sort order tables. Double-byte character sets are different because they
use two bytes for many characters. The computer distinguishes between a
standard single-byte character and a dual-byte character by the existence
of a leader byte. This leader byte tells the computer that the byte that
follows the leader byte is to be treated as a special character, rather
than simply part of the standard ASCII table.
FileFlex sort order tables are not limited to 256 bytes. Instead, they can
be anywhere from 256 bytes long to 65,280 bytes long (255 * 256). Each set
of 256 bytes in the sort order table is called a "sort order page"and the maximum number of sort order pages allowed by FileFlex is 255.
If you recall from earlier, each character value is represented in the sort
order table by three bytes, a leader char byte, a priority multiplier byte,
and a priority value byte. Also, if you recall, the leader char byte for
single-byte sort order tables was always 255. That told FileFlex to look
in the very first page of the sort order table (i.e., the very first set
of 256 bytes) for the character's priority value.
When you're using double-byte character sets, you'll need more than one
256-byte page to represent the sort order. The value that's placed in the
leader character tells FileFlex in which sort order page to look for the
priority value of the character which follows the leader character. Let's
diagram that out:
Suppose that your language character set uses characters with the value
of 128 as a leader character. Now, let's suppose your database has a double-byte
character with the values 128 and 065 respectively for the two bytes. Here's
how the sort order table might be be defined:
Sort order page 0
---------------------------
Position #128: 001 255 255
Sort order page 1
---------------------------
Position #65: 255 255 015
When reading the character stream, FileFlex would read the first byte and
determine it's value was 128. It would then go to position 128 in the sort
order table and read the first byte. Since the first byte (the leader byte
flag) is not a 255, it would know that 128 was a leader byte. Since the
leader byte flag is 1, FileFlex would know that the next character retrieved
should be compared against sort order page 1 (located in the second bank
of 256 bytes).
FileFlex would now read the second byte of the character. Since it knows
that this character is the second of a double-byte character set, FileFlex
will then determine the character's value (in this case 65) and jump 65
bytes into the second sort order page (or to byte 321...256+65...of the
full sort order table). 321 bytes into the table (position 65 in the second
page) FileFlex would look at the priority value byte and determine that
the priority of the character represented by 128 065 is 15.
Creating Double-Byte Sort Order Tables
You create a double-byte sort order table very much like you would a single-byte
table. You create sets of three-byte sequences for each character. For each
sort order page, you create 256 of these three byte sets. At the very end,
you place a single byte value of 256 that signifies the termination of the
table.
You should probably lay out the sort order tables on paper before you attempt
to write the code to generate a table.
First, you should determine those byte values that are leader bytes. For
every unique leader byte value, assign a sort order page, from page 1 to
254. Obviously, you want to keep the number of absolute sort order pages
down as much as possible to make things run faster and to use less memory.
For each leader byte in the sort order byte triplet, make sure you've set
the following two bytes to 255.
Next, fill in all the other remaining values in the first 256 byte page.
For each character, assign a weighted value and place that in the third
byte of the data triplet.
Note: you can use the second byte of the data triplet as a priority
multiplier. If you need priorities higher than 255, use the priority multiplier
byte by setting it to anything between 1 (earliest in the priority order)
to 254 (last in the priority search list order).
After you've filled in the first sort order page, you can then create the
subsequent pages. In these pages, the first byte of the triplet will always
be 255, the second byte between 1 and 254 depending on your desired priority
multiplier, and the third value byte also between 1 and 254.
Finally, append a terminator byte--which needs to be a charToNum(0) value.
Once you've layed all this out on paper, you can write a BuildSortOrder_
routine that will create a global variable containing your sort order.
Tricks with Sort Order
You can do some pretty interesting things with sort orders besides handling
international issues. For example, lets assume you wanted to sort numerical
data which you stored in a character field.
Note: You should generally do this because the DBF format stores
numbers as ASCII values internally. But if you use character fields to store
numbers, you get to manipulate values with more control (i.e., sort order).
So, again, let's assume you've got a character field containing numeric
data. Sometimes, in a numeric field, you might want to have spaces or asterisks
instead of zeros, like in the following example:
"0002598" " 2598" "***2598"
When creating a custom sort order table for numerical sorts in character
fields, you can give the space character (ASCII 32), the asterisk character
(ASCII 42), and the zero all the same priority value weighting. This would
cause the sorting/seeking routines to treat all three characters the same.
This kind of "equalizing" of sorting values also applies to those
special international characters, like letters with umlauts (e.g., the double-dots)
or accent marks over characters. You might want to treat a lower case 'a'
and a lower-case 'a' with an accent mark as the same character in sort order.
You can also do this with upper and lower case values. If you want upper
case and lower case letters to be sorted together, give them the same priority
value.
Setting the Sort Order with FileFlex
You can tell FileFlex to use a new sort order with the FileFlex command
DBSetSortOrder. Unlike most FileFlex commands, DBSetSortOrder is a wrapper
script that does not call FileFlex directly. Instead, DBSetSortOrder
sets two FileFlex global properties: gDBWorldSort and gDBSortOrder.
Note: I almost named the gDBSortOrder variable gDBWorldOrder. Then
the function would have been DBSetWorldOrder. But that seemed far too Republican,
so I restrained myself. Wouldn't it be great if you could write a new translation
table, give a quick call to DBSetWorldOrder, and--poof--a new world order
emerges? It gives new (and terrifying meaning) to the phrase "FileFlex
users rule!" [chuckle] [[shiver]].
Here's the Lingo code for DBSetSortOrder:
on DBSetSortOrder order
global gDBWorldSort
global gDBSortOrder
if order = EMPTY then
put EMPTY into gDBWorldSort
else
put "1" into gDBWorldSort
put order into gDBSortOrder
end if
return 0
end DBSetSortOrder
When you call DBSetSortOrder, you want to pass your sort order table. Here's
an example:
put DBSetSortOrder(ASCII) into DBResult
To disable custom sort order processing, set the sort order to the empty
string:
put DBSetSortOrder("") into DBResult
Inside of FileFlex is a C++ function called worldCompare(). When a DBCreateIndex
or DBSeek command is executed, at some time, the internal worldCompare routine
is called upon to compare two strings. When worldCompare is called, it asks
the host development environment (i.e., Director) for the value of the reserved
global variable gDBWorldSort. If worldCompare discovers that gDBWorldSort
is not empty, it then asks the host environment for the contents of the
global variable gDBSortOrder and uses that to control the comparison of
two strings.
Hint: One of the reasons building a sort order table is so complex
and precise is you're building an actual binary data structure that FileFlex
can use directly. While the table may be a bit painful to design once, this
mechanism allows FileFlex to do custom comparisons and switch sort order
tables at blinding speed.
To turn off a sort order table, send the empty string to DBSetSortOrder.
When this happens, the global gDBWorldSort is set to the empty string. FileFlex
then knows to skip the extra processing inherent in comparing world-aware
data strings.
Cautions: The sort order impacts the internal compare functions;
it does not reorder the dataset or the index. As a result, you should set
your sort order BEFORE you call DBCreateIndex and you should always use
the appropriate sort order table when doing a DBSeek or DBSelectIndex. Failure
to do this could cause your data to appear out of order. When writing records,
try not to get in the situation where two different sort orders need to
be active when writing one record.
Here's a sample script from the Sort Order demo file:
on mouseUp
global ASCIIReverse -- the reverse sort order table
-- initialize FF session
put DBOpenSession() into dbresult
if dbResult < 0 then
alert "FileFlex could not initialize!" exit
end if
-- open a database file
put dbUse(field "theDBFile") into dbID
if dbID < 0 then errorClose "Could not open database file." --
-- create a a custom index on TITLE using ASCIIReverse
--
buildSortOrder_ASCIIReverse -- build the sort order
put DBSetSortOrder(ASCIIReverse) into dbResult
put "Creating index file..." into field "status" updateStage
put dbCreateIndex("REVASCII","TITLE","0","0") into ndxID
if ndxID < 0 then errorClose "Could not create index file." -- fill the list
put "Scanning data file..." into field "status" updateStage
put DBSelectIndex(ndxID) into dbResult
if dbResult < 0 then errorClose "Could not select index file." put "" into theList
put DBTop() into dbResult
repeat while 1 = 1 -- forever
if theList <> "" then put return after theList
put DBGetFieldByName("TITLE") into title
updateStage
put title after theList
if DBSkip(1) = 3 then exit repeat
end repeat
put theList into field "movie list" updateStage
put DBSetSortOrder(EMPTY) into dbResult -- turn off
put DBCloseSession() into dbresult
if dbResult < 0 then
alert "FileFlex could not terminate!" exit
end if
put "Processing complete..." into field "status" updateStage
end
on errorClose s
alert s
put DBCloseSession() into dbresult
if dbResult < 0 then
alert "FileFlex could not terminate!" abort
end if
abort
end errorClose
Important: FileFlex uses the xBASE/dBASE III standard format.
This format does not permit 8-bit deep characters in memo fields contained
within DBT files. Attempting to do character translation to characters greater
than 128 can cause this format difficulties. If you need to store non-ASCII
text in memo fields, you should either use a custom translation table or
store your data in text files and refer to those files from FileFlex fixed-length
fields.
Character Translation
If you're using a language that has special characters in it's character
sets (i.e., accent marks, umlauts, and other specialty characters), you
may run into an interesting problem moving documents from Macintosh to Windows
or vice-versa. That's because while ASCII is cleanly defined for the US
English character set of "a-zA-Z", that does not mean that character
values of special characters are uniformly used across platforms.
FileFlex user Antonio Lucena of Madrid, Spain describes the conversion issue
as it pertains to DOS vs. Windows files as well:
"The problem is that Windows uses different character set than MS-DOS
(and the databases created with dBASE). MS-DOS uses OEM Char set, and Windows
uses ANSI. For example in OEM, a diacritical "e" is numbered 130,
but in ANSI, same "e" is numbered 233. The same problem appears
when you open a document (with diachitical vowels on it) made with the EDIT
tool from MS-DOS and you try to open it with the WRITE tool from Windows
and no previous conversion was made."
Note: The above message illustrates the value of the free fileflex-talk
mailing list. Another user had discovered the translation problem and by
asking questions to this user and making that dialog public via fileflex-talk,
Antonio was able to see the message and contribute his feedback. With feedback
from him and others, we were able to identify the need for the new DBTranslateChars
function described below.
FileFlex WorldFlex technology provides for character-level translation using
much the same mechanism as used for developing sort order tables. You develop
a translation table that describes the new and old values and pass it to
FileFlex along with a container of characters to be translated.
Setting up a character translation table is very straightforward. You need
to build a Lingo string consisting of 256 characters. The position in the
string is the value of the old character and the value at that position
becomes the new character.
Note: The first character in the string is considered "position
0" by FileFlex. Also note that you cannot place a 0 into any character
position. If you do not want translation, place the corresponding character
value into that position or the value 255.
Creating Character Translation Utility Scripts
The best way to create the character translation table is to write a simple
utility script. Here's an example script that simply contains the ASCII
character set:
on buildTranslateTable_ASCIIX
global ASCIIX
put "" into theTable
repeat with i = 0 to 255
if i = 0 then
put numToChar(255) after theTable -- use 255 in byte 0
else
put numToChar(i) after theTable -- position in table
end if
end repeat
put theTable into ASCIIX
end buildTranslateTable_ASCIIX
Note the name of the handler is "BuildTranslateTable_ASCIIX".
We've developed a convention where the routine that builds the translation
table is called "BuildTranslateTable_" and the name of the translation
itself is appended to the end. In order to prevent confusion from sort order
tables, we've also placed an X after every translation table ("X"for an often used abbreviation for translate, which is "Xlate").
The translation table is placed in a global variable of the same name. So,
for a translation table that converts to Windows diacriticals, we recommend
naming the handler "BuildTranslateTable_WinCharX" and the global
variable containing the sort order "WinCharX".
Here's an example routine that converts upper case to lower case (and the
reverse):
on buildTranslateTable_CaseReverseX
global CaseReverseX, ASCIIX
buildTranslateTable_ASCIIX
put ASCIIX into theTable
-- fill in lower case
repeat with i = 65 to 90
put numToChar(i+32) into char i+1 of theTable
-- using i+1 above because strings begin at 1, not 0
end repeat
-- fill in upper case
repeat with i = 97 to 122
put numToChar(i-32) into char i+1 of theTable
end repeat
put theTable into CaseReverseX
end buildTranslateTable_CaseReverseX
The above routine reverses the case, so an upper case "A" becomes
a lower case "a" and vice versa. To create a routine that always
converts to upper case, make both sets of characters upper case. Likewise,
to create a routine that always converts to lower case, make both sets of
characters lower case. Here's an UpperX routine:
on buildTranslateTable_UpperX
global UpperX, ASCIIX
buildTranslateTable_ASCIIX
put ASCIIX into theTable
-- fill in upper case
repeat with i = 97 to 122
put numToChar(i-32) into char i+1 of theTable
-- using i+1 above because strings begin at 1, not 0
end repeat
put theTable into UpperX
end buildTranslateTable_UpperX
WARNING: Make absolutely certain you fill in all 256 bytes. Failure
to do this could cause FileFlex to scan beyond the end of the translation
table and the results could be unpredictable and your program could abnormally
terminate.
Translating Characters Using FileFlex
You can use FileFlex to translate character sets within a text container
using the DBTranslateChars function. DBTranslateChars takes two parameters:
the string to be translated and the pre-built translation table described
above. It returns the translated string:
put DBTranslateChars(myString,CaseReverseX) into newString
Here's a sample routine that will do the character translation (it presupposes
that FileFlex has been initialized properly with DBOpenSession):
on mouseUp
global CaseReverseX
buildTranslateTable_CaseReverseX
put DBTranslateChars(field "text data",CaseReverseX)
into field "text data" end mouseUp
Case Translation
If you're using a language that has special characters in it's character
sets (i.e., accent marks, umlauts, and other specialty characters), you
may run into an interesting problem converting between upper and lower case.
With standard ASCII, it's easy to do a case conversion: just add or subtract
32 to the character's value. That's because in ASCII, the upper or lower
case character is always algorithmically deterministic. However, when dealing
with international character sets where lower case characters might have
diacritical marks, it becomes much harder. That's because the characters
have a wide variety of values and because there is little standardization.
FileFlex WorldFlex technology provides for intelligent case translation
using much the same mechanism as used for developing character translation
tables. You develop a translation table that describes the new and old values
and pass it to FileFlex along with a container of characters to be translated.
You'll need to set up two case translation tables; one going to upper case
and one going to lower case. For each table, you must build a Lingo string
consisting of 256 characters. The position in the string is the value of
the old character and the value at that position becomes the new character.
Note: The first character in the string is considered "position
0" by FileFlex. Also note that you cannot place a 0 into any character
position. If you do not want translation, place the corresponding character
value into that position or the value 255.
Creating Case Translation Utility Scripts
The best way to create the case translation table is to write a simple utility
script. Here's an example script that simply converts ASCII lower case to
ASCII upper case:
on buildCaseTable_AsciiUC
global AsciiUC
put "" into theTable
-- Although it takes a few extra cycles, consider
-- building a full table first, then modifying it below.
-- This is much easier to understand and test.
repeat with i = 0 to 255
if i = 0 then
put numToChar(255) after theTable -- use 255 in byte 0
else
put numToChar(i) after theTable -- position in table
end if
end repeat
-- fill in upper case
repeat with i = 97 to 122
put numToChar(i-32) into char i+1 of theTable
-- using i+1 above because strings begin at 1, not 0
end repeat
put theTable into AsciiUC
end buildCaseTable_AsciiUC
Note the name of the handler is "BuildCaseTable_AsciiUC". We've
developed a convention where the routine that builds the translation table
is called "BuildCaseTable_" and the name of the translation itself
is appended to the end. In order to prevent confusion with other tables,
we've also placed an UC after every translation table (for translation to
upper case--use "LC" for translation to lower case). The upper
case table is placed in a global variable of the same name.
Here's the routine that translates back down to lower case:
on buildCaseTable_AsciiLC
global AsciiLC
put "" into theTable
-- Although it takes a few extra cycles, consider
-- building a full table first, then modifying it below.
-- This is much easier to understand and test.
repeat with i = 0 to 255
if i = 0 then
put numToChar(255) after theTable -- use 255 in byte 0
else
put numToChar(i) after theTable -- position in table
end if
end repeat
-- fill in lower case
repeat with i = 65 to 90
put numToChar(i+32) into char i+1 of theTable
-- using i+1 above because strings begin at 1, not 0
end repeat
put theTable into AsciiLC
end buildCaseTable_AsciiLC
WARNING: Make absolutely certain you fill in all 256 bytes. Failure
to do this could cause FileFlex to scan beyond the end of the translation
table and the results could be unpredictable and your program could abnormally
terminate.
Intelligent Case Conversion Using FileFlex
Case translation is used in a number of important ways within FileFlex,
in particular within the intrinsic functions used in indexes and queries,
and through special utility functions provided to perform simple case conversion.
You can tell FileFlex to use a case translation table with the FileFlex
command DBSetCaseTables. Unlike most FileFlex commands, DBSetCaseTables
is a wrapper script that does not call FileFlex directly. Instead,
DBSetCaseTables sets three FileFlex global properties: gDBWorldCase, gDBWorldUpper
and gDBWorldLower.
Here's the Lingo code for DBSetCaseTables:
on DBSetCaseTables upperTable, lowerTable
global gDBWorldCase
global gDBWorldUpper, gDBWorldLower
if (upperTable = EMPTY or lowerTable = EMPTY) then
put EMPTY into gDBWorldCase
else
put "1" into gDBWorldCase
put upperTable into gDBWorldUpper
put lowerTable into gDBWorldLower
end if
return 0
end DBSetCaseTables
When you call DBSetCaseTables, you want to pass your case tables. Here's
an example:
put DBSetCaseTables(AsciiUC, AsciiLC) into DBResult
To disable custom case conversion processing, set the sort order to the
empty string:
put DBSetCaseTables("") into DBResult
Inside of FileFlex is a C++ function called worldUpper(). When an intrinsic
UPPER function is executed, the internal worldUpper routine is called upon
to do the case conversion. When worldUpper is called, it asks the host development
environment (i.e., Director) for the value of the reserved global variable
gDBWorldCase. If worldUpper discovers that gDBWorldCase is not empty, it
then asks the host environment for the contents of the global variables
gDBWorldUpper and gDBWorldLower and uses them to control the conversion
of the strings.
To turn off custom case conversion, send the empty string to DBSetCaseTables.
When this happens, the global gDBWorldCase is set to the empty string. FileFlex
then knows to skip the extra processing inherent in case conversion of world-aware
data strings.
Cautions: Be careful that the first parameter is the upper case table
and the second parameter is the lower case table. Also make sure you pass
two tables. Failure to pass two complete case conversion tables could cause
unpredictable results and might lead to abnormal termination.
Standalone Intelligent Case Conversion Functions
In addition to doing intelligent case conversions within index and query
functions, FileFlex provides you with the ability to do intelligent case
conversions of standalone strings.
The function DBUpper will convert a string intelligently from lower case
to upper case. If case tables have already been set with DBSetCaseTables,
DBUpper will use those tables, otherwise it will use the standard ASCII
upper case conversion. Here's how to call DBUpper:
put DBUpper(string) into newString
Likewise DBLower will convert a string intelligently from upper case to
lower case. If case tables have already been set with DBSetCaseTables, DBLower
will use those tables, otherwise it will use the standard ASCII lower case
conversion. Here's how to call DBLower:
put DBUpper(string) into newString