Converting your Application
Introduction,
Converting,
Limitations,
Future,
Closing
We will first take you through a step-by-step process of
converting your application into a global one. This will also guide
you in making an application global from the beginning.
For more detailed information about each of the topics, you
should definitely consult the Java 1.1 International API
documentation. Thoroughly covering all the issues involved in
developing global applications is beyond the scope of this paper,
but there are a number of resources available on the Web or in
print. See the references at the end of this document.
In the course of this section our examples are additive, with each
successive one adding to code that may already have been
converted to some extent. In those cases, the Old heading
refers to the partially converted code, not to your original.
When the user starts up a global application, the default
locale will be used for the display text and other user inter FACE
elements. If the user changes the default locale (generally with
some mechanism on the host system), then you can get a different
user inter FACE language. (If you wish, you can go further, and
support a multilingual application (or applet), which
allows use of simultaneous multiple locales. We'll discuss this
later.)
The first step to take in preparing your program is to enable
translation of display strings by separating them from the
rest of your code. With Java, this is done via ResourceBundles.
These provide a general mechanism that allows you to access
strings and other objects according to locale conventions. In
principle, they are fairly simple: They simply provide a mapping
from <key,locale> to <value>. However, they also
supply inheritance among locale resources that allows you to
minimize duplication across countries, and gives you a graceful
degradation in case the exact locale does not have localized
resources.
Unfortunately, at this point in time, you will have to change
your code by hand to make your strings translatable. You can do
the bulk of the work with a PERL script if you want, but you will
still have to check the results. The general problem with
converting strings automatically is the difficulty in
distinguishing display strings from non-display strings. Once the
commercial Java development environments fully support resource
bundles, this task should become easier, since they will make it
easy to avoid mixing display strings in the code in the first
place.
Resource bundles are very flexible--so flexible that getting started may be
mystifying. You can use list resource bundles,
property resource bundles, or you can make your own. You can have
fine-grained bundles or coarse-grained, and so on. To give you
some direction, we'll start by showing one particular way to
use resource bundles, and then we'll discuss some of the other options.
- Create a class called MyResources.
Making an Empty
ResourceBundle
New
|
public class MyResources extends ListResourceBundle {
// boilerplate
public static ResourceBundle rb =
ResourceBundle.getBundle("MyResources");
public Object[][] getContents() { return contents; }
static final Object[][] contents = {
// insert localized {key, value}, pairs below
};
}
|
- Treat each display string as in the following example,
inserting:
Moving Strings to Resource
Bundles
Old
|
myCheckbox = new Checkbox("Clean ink cartridge before printing document");
|
// insert localized {key, value}, pairs below
...
|
New
|
myCheckbox = new Checkbox(MyResources.rb.getString("CleanCartridge"));
|
// insert localized {key, value} pairs here
{"CleanCartridge", "Clean ink cartridge before printing document"},
...
|
- You have now set up a series of resource pairs of the
form {key, value}. The resource keys (such as "CleanCartridge")
can be any unique string you want, even the original string.
However, you are better off using short, clear names:
remember that your translators will be seeing these too.
- To then create a new French translation for your program:
- Copy MyResources and rename it to MyResources_fr
by appending the proper Java language ID. (You
can see a list of the language IDs on the Unicode Web site.
By convention, language codes are lowercased.)
- Make it extend its parent, MyResources.
- Remove the static rb (you only want this
on the root class).
- Translate the resource values into French (but
not the keys!).
- If any of the values are unchanged--the same as
the parent--remove the whole {key, value} pair.
(You can do this because the {key, value} pairs
are inherited from the parent.)
- Do the same again for any other other language
you want to support.
Translating Resource Bundles
Old
|
// insert localized {key, value} pairs here
{"CleanCartridge", "Clean ink cartridge before printing document"},
|
New
|
// insert localized {key, value}, pairs below
{"CleanCartridge", "Cleanez le cartridge de inque après que... "},
|
- If you have special strings for a particular country--and
not just language--then you do much the same as above.
- Create a bundle in the same way for that country,
such as MyResources_fr_BE for Belgium,
by appending the proper Java country ID (You can
see a list of the country IDs on the Unicode Web site.
By convention, country codes are uppercased.)
- Make it extend its parent; in this case, MyResources_fr.
- If any of the values are unchanged--the same as
the parent--remove the whole {key, value} pair.
The way this is set up, the static rb will be
initialized with the proper resource bundle according to the
default Locale. This convenience allows us to refer to
that bundle. You can use locale variables instead to reference
the resource bundle, if you want.
If your resource bundle gets too large then you can subdivide
it into other resource bundles, such as MyPrintingResources,
following the same pattern. You can make this as fine-grained as
you want, so that you only load the resources for a part of your
program when that part gets used. You can return arbitrary
objects, not only strings, since it is often easier (and
sometimes necessary, such as for graphics that need to be
localized). For example:
Moving Objects to Resource
Bundles
Old
|
myCheckbox = new Checkbox("Clean ink cartridge before printing document");
|
// insert localized {key, value}, pairs below
...
|
New
|
myCheckbox = (Checkbox) MyResources.rb.getObject("CleanCartridge");
|
// insert localized {key, value} pairs here
{"CleanCartridge", new Checkbox("Clean ink cartridge before printing document")},
...
|
You can also use a PropertyResourceBundle instead of
a ListResourceBundle. In that case, what you do is
follow the same pattern, but instead of creating a class, you put
the {key, value} pairs into a PropertyFile. The name of
the property file is the same as the name of the ListResourceBundle
that you would have had. Since you don't have classes any more,
put your static rb in some convenient place, such as in
your applet, and use that name for references (e.g. MyApplet.rb.getString("CleanCartridge")).
However, if you use a PropertyResourceBundle, be aware
that you can only extract strings--and not other classes.
Resource bundles have a very simple interFACE. If you wanted
to use other sources for your strings, you can always subclass to
make your own resource bundle. For example, you could write one
that accessed strings or serialized objects out of a database, or
even over the Web. The basic requirements are to map keys to
values, and provide for the inheritance of keys discussed above.
For more information, see the Taligent
Java Demos and the JavaSoft
International Specification.
Remove Concatenation
If you haven't done much internationalization, then you may
not have heard the mantra: Never concatenate display strings!
Why is this a problem? Well, the order of parts of a sentence is
different in different languages; this difference can easily lead you into
trouble. For example, if you write MyResources.rb.getString("DeleteBefore")
+ someDate, the localizer is limited to modifying only the
string, and not the position of the date. If the language
requires verbs to be at the end of the sentence, the localizer is
stuck.
You can replace concatenation by use of MessageFormats, which
allow the localizer to position the variable information
appropriately:
MessageFormat Instead of
Concatenation
Old
|
myCheckbox = new Checkbox(MyResources.rb.getString("DeleteBefore") + someDate);
|
// insert localized {key, value}, pairs below
{"DeleteBefore", "Delete all files before "},
|
New
|
MessageFormat mf = new MessageFormat(MyResources.rb.getString("DeleteBefore"));
myCheckbox = new Checkbox(mf.format(new Object[] {someDate}));
|
// insert localized {key, value}, pairs below
{"DeleteBefore", "Delete files before {0}"},
|
Note
|
The reason for using the array of objects for the
parameter is to allow multiple arguments. There will
probably be convenience methods in the future to make
this a bit smoother. |
This new pattern string can then be localized, allowing
rearrangement of the position of the argument {0}. If you
want to, you could combine this into a single statement, using
the static MessageFormat.format():
One-Line MessageFormat
New
|
myCheckbox = new Checkbox(MessageFormat.format(
MyResources.rb.getString("DeleteBefore"), new Object[] {someDate}));
|
Message formats can also be used to customize the precise
format of dates, times, numbers, or currencies. If you only
specify the position of the argument, then a default for the
current locale will be chosen. However, you (for English) or the
localizer (for other languages) can also more precisely control
the format if you desire. This is done by adding additional keywords
or patterns after the argument number, as in the following
examples:
Argument |
new Date(97,22,5);
|
Pattern |
Result |
"Delete files before {0}"
|
Delete files before 6/13/97 1:00 AM |
"Delete files before {0,date,long}"
|
Delete files before June 13, 1997 |
"Delete files before {0,date,yyyy.MMM.dd}"
|
Delete files before 1997.Jun.13 |
For more information, see the Taligent
Java Demos and the JavaSoft
International Specification.
Number and date formats can also be used separately, with
similar control over their formatting. (Number formats handle
general numbers and currencies; date formats cover both dates and
times.) To globalize your program, replace the implicit
conversion of a number to a string with an explicit formatting
call, and put the pattern for the format into a resource bundle,
as in the following example:
Number Output
Old
|
myTextField.setText(myNumber);
|
// insert localized {key, value}, pairs below
...
|
New
|
NumberFormat nf = (NumberFormat)(MyResources.rb.getObject("PageNumberFormat"));
myTextField.setText(nf.format(myNumber));
|
// insert localized {key, value}, pairs below
{"PageNumberFormat", new DecimalFormat("#,##0")},
...
|
If you want to get only a string from the resources and create
your own number format, you can do it. However, you must do it in
a a special way. You should always get a number format using getInstance(),
since a particular locale may have a specialized subclass of NumberFormat.
However, this subclass may not allow use of a pattern string. So
you need to check the type of the NumberFormat you get
before setting the pattern. (This should be simplified in a
future release of Java.)
Number Output from String
Resource
Old
|
myTextField.setText(myNumber);
|
// insert localized {key, value}, pairs below
...
|
New
|
NumberFormat nf = NumberFormat.getInstance();
if (nf instanceof DecimalFormat)
((DecimalFormat)nf).applyPattern(MyResources.rb.getString("PageNumberFormat"));
myTextField.setText(nf.format(myNumber));
|
// insert localized {key, value}, pairs below
{"PageNumberFormat", "#,##0"},
...
|
You can also programmatically alter number formats, such as by
setting the maximum or minimum number of decimals, or by deciding whether a
thousands separator is used. However, it is better practice to
use a pattern string instead, since, otherwise, you don't allow
your localizers to customize the format.
Of course, if you are formatting in a tight loop, you should
move the creation of the format
out of the loop! You can also make your formats static to avoid
repeated creations.
Instead of using methods on Integer, Float,
etc. to do conversion from Strings to numbers, dates,
times, etc., use the appropriate formats again for parsing. A Format
will parse what it can produce (and more), so you can use the
same one for output and input.
Number Input
Old
|
try {
myNumber = Integer.parseInt(myTextField.getText());
} catch (NumberFormatException e) {
alertBadNumber(e);
}
|
New
|
try {
myNumber = nf.parse(myTextField2.getText());
} catch (ParseException e) {
alertBadNumber(e);
}
|
If you are creating your own display of date fields, such as for
an alarm clock widget, then you may want to display the different
component fields (year, month, date...) each in a separate TextField.
Then you will want to use a Calendar, which will convert
the standard Date into its components according to local
conventions.
Note
|
The order and choice of these fields may vary
according to local conventions. For example, the year may
come at the start of the date instead of the end, or the
date format may even consist of very different
information, such as year + day-in-year.
Currently, there is no simple way to get the order of the
fields in the format; that should be addressed in a
future release. In the meantime, if you intend to use FieldPosition
to determine the position of the fields with the text, be
warned that there is a bug that makes that difficult:
consult Taligent's Web site for a workaround. |
Calendar has special support for clock widgets. For
any given field, it can tell you the result of incrementing or
decrementing that field. It also supports a variant form of
incrementing/decrementing, called rolling, which gives you
the same effect as setting a field on your digital watch, in which
changing the minute field doesn't affect the hour: ...11:58,
11:59, 11:00, 11:01...
For more information, see the Taligent
Java Demos and the JavaSoft
International Specification.
Fix String Comparison
The standard comparison in String will only do a binary
comparison. For display strings, this is almost always incorrect!
Wherever the ordering or equality of strings is important
to the user, such as when presenting an alphabetized list, then
use a Collator instead. Otherwise a German, for example,
will find that you don't equate two strings that he thinks are
equal!
String Comparison
Old
|
if (string1.equals(string2)) {...
...
if (string1.compare(string2) < 0) {...
|
New
|
Collator col = Collator.getInstance();
if (col.equals(string1, string2)) {...
...
if (col.compare(string1, string2) < 0) {...
|
Of course, if you are comparing strings in a tight loop, you
should move the creation of the collator out of
the loop! You can also make your collator static to avoid
repeated creations.
If a string is going to be compared multiple times, then use a
CollationKey instead. This preprocesses the string to
handle all of the international issues, and converts it into an
internal form that can be compared with a simple binary
comparison. This makes multiple comparisons much faster.
Using CollationKey
New
|
// make up a list of sort keys
CollationKey[] keys = new CollationKey[sourceStrings.length];
for (int i = 0; i < sourceStrings.length; ++i) {
keys[i] = col.getCollationKey(sourceStrings[i]);
}
// now sort and stuff them into an AWT List
sort(keys);
List list = new List();
for (int i = 0; i < sourceStrings.length; ++i) {
list.addItem(keys[i].getSourceString());
}
|
There are also a number of advanced features in Collators, such
as the ability to merge in additional rules at runtime or
modify the rules. For example, you can make "b" sort
after "c", if you really wanted, or you can have "?"
sort exactly as if it were spelt out as
"question-mark". You can also use collators to do
correct native-language searching as well as sorting, using a CollationElementIterator.
However, this code is not straightforward, and I would recommend
waiting until there are methods in Java to do it for you.
For more information, see the Taligent
Java Demos and the JavaSoft
International Specification.
Use Character
Properties
If your code assumes that all characters of a given type (such
as letters or digits) are the ones in the ASCII range, then it
will break with foreign languages. Rather than test for
particular ranges of characters, you should use the Unicode
character properties wherever possible.
Replacing Range Tests
Old
|
for (i = 0; i < string.length(); ++i) {
char ch = string.charAt(i);
if ((ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z')) {
// we have a letter, do something with it.
|
New
|
for (i = 0; i < string.length(); ++i) {
char ch = string.charAt(i);
if (Character.isLetter(ch)) {
// we have a letter (including non ASCII), do something with it.
|
A number of methods are defined for the more common Unicode
character properties. In addition, you have full access to all
the Unicode 2.0.14 character categories by using Character.getType().
For more information, see the JavaSoft
International Specification.
Replacing Type Tests
Old
|
for (i = 0; i < string.length(); ++i) {
char ch = string.charAt(i);
if (ch == '(' || ch == '{' || ch == '[') {
// we have an open brace, do something with it.
|
New
|
for (i = 0; i < string.length(); ++i) {
char ch = string.charAt(i);
if (Character.getType(ch) == Character.START_PUNCTUATION) {
// we have an open brace (including non ASCII), do something with it.
|
Extend Word-Break
Detection
Word breaks in natural language are not only defined by spaces.
For example, when I search in this word processor for the word
"checked" with the option "Whole Words"
checked, I find the last instance of "checked" even
though it is not bounded by spaces (there is a comma at the end).
Even if you are using more sophisticated tests for ASCII text,
such as checking for various kinds of punctuation, you must now deal with
the wealth of possible characters in Unicode, and how they may
behave differently in different countries. By using a BreakIterator,
you can avoid dealing with these complexities.
Going Word-by-Word
BreakIterator boundary = BreakIterator.getWordInstance();
boundary.setText(stringToExamine);
int start = boundary.first();
for (int end = boundary.next();
end != BreakIterator.DONE;
start = end, end = boundary.next()) {
System.out.println(source.substring(start,end));
}
|
To find out whether a current index is at a word break, you
can use the following code (this should be in a convenience
routine in a future release):
Testing Word Breaks
if (currentIndex < 0 || currentIndex > stringToExamine.length())
return false;
if (currentIndex == 0 || currentIndex == stringToExamine.length())
return true;
int discard = boundary.following(currentIndex);
if (boundary.previous() == currentIndex)
return true;
return false;
|
You can use different break iterators to find word boundaries,
line-wrap boundaries, sentence boundaries and character
boundaries. The latter may seem mysterious: character simply means
Unicode character, right? However, what native users consider a
single character may not be only a single Unicode character, and
user expectations may differ from country to country.
Note
|
In the Java code base is a DecompositionIterator
(it is currently private).
This actually walks through Unicode text and returns
normalized characters. For example, it maps the
compatibility characters (such as the FULLWIDTH EXCLAMATION
MARK) at the end of the Unicode range onto their
respective standard characters. Once this is made public,
then it can also be used in processing text. |
For more information, see the Taligent
Java Demos and the JavaSoft
International Specification.
Convert Non-Unicode
Text
As long as you are writing a pure Java application using only
Unicode characters, you don't have to worry about the thousands
of possible character sets out in the world. However, if you are
dealing with other data, then you will need to convert in and out
of Unicode.
Unfortunately, the API for doing character code conversions is
fairly limited at this time, although the hidden implementation
is quite extensive. There are two places where this API surFACEs.
In each of them, you use a string to identify the non-Unicode
character set that you are converting to or from. You can attach
an encoding to a stream (OutputStreamWriter or InputStreamReader)
or you specify on String the encoding when constructing
from an array of bytes or when using the getBytes
method to convert to bytes.
Using Foreign Character Sets
// convert from ISO 8859-2 into Macintosh Central European
String string = new String(foreignBytes[],"8859_2");
otherBytes = string.getBytes("MacCentralEurope");
|
Note
|
Remember that the length of any conversion is not
necessarily the same as the length of the source. For
example, when converting the SJIS encoding to Unicode,
sometimes one byte will convert into a single Unicode
character, and sometimes two bytes will. |
There is no programmatic way to get a list of the supported
character sets, other than to delve into the Sun directory in the
Java source. Following is a list of the current supported sets on NT,
gotten in just that fashion. Unfortunately, there is no guarantee
that these will be present on every platform, nor is there yet
documentation of what some of the more obscure names in this list
actually refer to!
Foreign Character Set Labels
"Default" "8859_1"
"8859_2" "8859_3" "8859_4"
"8859_5" "8859_6" "8859_7"
"8859_8" "8859_9"
"Cp037" "Cp273" "Cp277"
"Cp278" "Cp280" "Cp284"
"Cp285" "Cp297" "Cp420"
"Cp424" "Cp437" "Cp500"
"Cp737" "Cp775" "Cp838"
"Cp850" "Cp852" "Cp855"
"Cp856" "Cp857" "Cp860"
"Cp861" "Cp862" "Cp863"
"Cp864" "Cp865" "Cp866"
"Cp868" "Cp869" "Cp870"
"Cp871" "Cp874" "Cp875"
"Cp918" "Cp921" "Cp922"
"Cp930" "Cp933" "Cp935"
"Cp937" "Cp939" "Cp942"
"Cp948" "Cp949" "Cp950"
"Cp964" "Cp970" "Cp1006"
"Cp1025" "Cp1026" "Cp1046"
"Cp1097" "Cp1098" "Cp1112"
"Cp1122" "Cp1123" "Cp1124"
"Cp1250" "Cp1251" "Cp1252"
"Cp1253" "Cp1254" "Cp1255"
"Cp1256" "Cp1257" "Cp1258"
"Cp1381" "Cp1383" "Cp33722"
"MS874"
"DBCS_ASCII" "DBCS_EBCDIC"
"EUC" "EUCJIS" "GB2312"
"JIS" "JIS0208" "KOI8_R"
"KSC5601" "SJIS"
"SingleByte" "Big5"
"CNS11643"
"MacArabic" "MacCentralEurope"
"MacCroatian" "MacCyrillic"
"MacDingbat" "MacGreek"
"MacHebrew" "MacIceland"
"MacRoman" "MacRomania"
"MacSymbol" "MacThai"
"MacTurkish" "MacUkraine"
"Unicode" "UnicodeBig"
"UnicodeLittle" "UTF8"
Only to Unicode: "JISAutoDetect"
Only from Unicode: "UnicodeBigUnmarked"
"UnicodeLittleUnmarked"
|
Handle Multilingual
Text
If you wish, you can go further and support a multilingual
application (or applet), which allows use of simultaneous
multiple locales. First you should understand an important
distinction between multilingual data and multilingual
user interFACE:
- multilingual data: Users can enter data or set
data formats according to multiple locales (e.g. formats
of cells in a spreadsheet).
- multilingual user interFACE: Users can switch the
locale of the display of your application (Menus,
Buttons, etc.) at runtime.
You can support both in JDK 1.1, but most people don't find it
worth the effort to support a runtime multilingual user
interFACE.
Since Unicode is the character set for Java, the user can
enter in multilingual data (with some restrictions; see Limitations of JDK 1.1). All
of the formats, collators, and other international classes allow
you to pass an explicit Locale as a parameter. You can
thus give the user the choice of which locales to use for your
display locale or for data. This allows you, for example, to
easily have French dates in one column of a table and German
dates in another.
Multilingual Text Handling
NumberFormat nf = NumberFormat.getInstance(Locale.FRANCE);
// or
NumberFormat nf = NumberFormat.getInstance(new Locale("fr","",""));
|
To find out the list of locales available for a particular
type of object, such as a NumberFormat, look for a static
on that object (or its base class) called getAvailableLocales().
To then display the localized names of those locales, such as in
a Menu or List, use getDisplayName().
Listing Locales
numberLocaleMenu = new Menu("&Locale");
Locale[] locales = NumberFormat.getAvailableLocales();
for (int i = 0; i < locales.length; ++i) {
numberLocaleMenu.addItem(locales[i].getDisplayName());
}
|
The following table lists the locales that currently have
localized international objects (numbers, dates, etc.) in Java
1.1. If you create locales from the arguments listed, you get the
corresponding display names in the adjacent column. This list is
supplied only for comparison; you should always use code
to find the actual localized objects on your current system.
Notice that if you don't supply a specific country (or variant),
a default will be chosen.
JDK 1.1 Locales
Arguments
|
Display Name
|
Arguments
|
Display Name
|
"ar","",""
"be","",""
"bg","",""
"ca","",""
"cs","",""
"da","",""
"de","",""
"de","AT",""
"de","CH",""
"el","",""
"en","",""
"en","CA",""
"en","GB",""
"en","IE",""
"es","",""
"et","",""
"fi","",""
"fr","",""
"fr","BE",""
"fr","CA",""
"fr","CH",""
"hr","",""
"hu","",""
"is","",""
"it","",""
"it","CH",""
|
Arabic (Egypt)
Belorussian (Belarus)
Bulgarian (Bulgaria)
Catalan (Spain)
Czech (Czech Republic)
Danish (Denmark)
German (Germany)
German (Austria)
German (Switzerland)
Greek (Greece)
English (United States)
English (Canada)
English (United Kingdom)
English (Ireland)
Spanish (Spain)
Estonian (Estonia)
Finnish (Finland)
French (France)
French (Belgium)
French (Canada)
French (Switzerland)
Croatian (Croatia)
Hungarian (Hungary)
Icelandic (Iceland)
Italian (Italy)
Italian (Switzerland)
|
"iw","",""
"ja","",""
"ko","",""
"lt","",""
"lv","",""
"mk","",""
"nl","",""
"nl","BE",""
"no","",""
"no","NO","NY"
"pl","",""
"pt","",""
"ro","",""
"ru","",""
"sh","",""
"sk","",""
"sl","",""
"sq","",""
"sr","",""
"sv","",""
"tr","",""
"uk","",""
"zh","",""
"zh","TW",""
|
Hebrew (Israel)
Japanese (Japan)
Korean (Korea)
Lithuanian (Lituania)
Latvian (Latvia)
Macedonian (Macedonia)
Dutch (Netherlands)
Dutch (Belgium)
Norwegian (Bokmål) (Norway)
Norwegian (Nynorsk) (Norway,NY)
Polish (Poland)
Portuguese (Portugal)
Romanian (Romania)
Russian (Russia)
Serbian (Latin) (Serbia)
Slovak (Slovakia)
Slovene (Slovenia)
Albanian (Albania)
Serbian (Cyrillic) (Serbia)
Swedish (Sweden)
Turkish (Turkey)
Ukrainian (Ukraine)
Chinese (China)
Chinese (ROC)
|
Although for most applications a runtime multilingual user
interFACE is not worth the effort, if you do want to support it,
you will restructure your application somewhat. Essentially, you
must do one of the following:
- Separate out the code that builds your UI. When the user
picks a different UI locale from a menu, you reset the
default locale and then simply call your code to rebuild
the whole UI.
- Provide code that walks through your UI. When the user
picks a different UI locale from a menu, you reset the
default locale and call this code to go through each menu
and container to replace each individual element with the
appropriate new resources.
Introduction,
Converting,
Limitations,
Future,
Closing
|