home *** CD-ROM | disk | FTP | other *** search
- Locales mini-HOWTO
- Peeter Joot, peeter_joot@vnet.ibm.com
- v1.5, 21 July 1997
-
- This document describes how to set up your Linux machine to use
- locales.
-
- 1. Introduction
-
- This is really a description of what I had to do to get localedef
- installed, compile some locales, and try them out. I did this just
- for fun, and thought that perhaps some people would be interested in
- trying it out themselves. Once it is set up you should be able to use
- NLS enabled applications with the locale of your choice. After a
- while, locale support should be part of the standard distributions,
- and most of this mini-HOWTO will be redundant.
-
- 2. What is a "locale" anyhow?
-
- Locales encapsulate some of the language/culture specific things that
- you shouldn't hard code in your programs.
-
- If you have various locales installed on your computer then you can
- select via the following list of environment variables how a locale
- sensitive program will behave. The default locale is the C, or POSIX
- locale which is hard coded in libc.
-
- LANG
- This sets the locale, but can be overridden with any other
- LC_xxxx environment variables
-
- LC_COLLATE
- Sort order.
-
- LC_CTYPE
- Character definitions, uppercase, lowercase, ... These are used
- by the functions like toupper, tolower, islower, isdigit, ...
-
- LC_MONETARY
- Contains the information necessary to format money in the
- fashion expected. It has the definitions of things like the
- thousands separator, decimal separator, and what the monetary
- symbol is and how to position it.
-
- LC_NUMERIC
- Thousands, and decimal separators, and the numeric grouping
- expected.
-
- LC_TIME
- How to specify the time, and date. This has the things like the
- days of the week, and months of the year in abbreviated, and non
- abbreviated form.
-
- LC_MESSAGES
- Yes, and No expressions.
-
- LC_ALL
- This sets the locale, and overrides any other LC_xxxx
- environment variables.
-
- Here are some other locales, and there are lots more.
-
- en_CA
- English Canadian.
-
- en_US
- US English.
-
- de_DE
- Germany's German.
-
- fr_FR
- France's French.
-
- If you are writing a program, and want to to be usable internationally
- you should utilize locales. The most glaring reason for this is that
- not everybody is going to use the same character set/code page as you.
-
- Make sure in your programs that you don't do things like:
-
- /* check for alphabetic characters */
- if ( (( c >= 'a') && ( c <= 'z' )) ||
- (( c >= 'A') && ( c <= 'Z' )) ) { ... }
-
- If you write that type of code your program assumes that the
- user/file/... is ASCII and nothing but ASCII, and it does not respect
- the code page definitions of the user's locale. For example it
- preludes characters such as a-umelaut which would be used in a German
- environment. What you should do instead is use the locale sensitive
- functions like isalpha(). If your program does expliticly require use
- of only US-ASCII alphabetics, you still use the isalpha() function,
- but you must also either do setlocale(LC_CTYPE,"C") or set the LANG,
- LC_CTYPE, or LC_ALL environment variables to "C".
-
- Locales allow a large degree of flexibility and make certain
- assumptions that a programmer may have made in ASCII based C programs
- invalid.
-
- For instance, you cannot assume the code positions of characters.
- There is nothing stopping you from creating a charmap file that
- defines the code position of 'A' to be 0xC1 rather than 0x41. This is
- in fact the code point mapping for 'A' in IBM code page 37, used on
- mainframes, while the former is used for US-ASCII, iso8859-x, and
- others.
-
- The basic idea is different people speak different languages, expect
- different sorting orders, use different code pages, and live in
- different countries. Locales and locale sensitive programs give one a
- means to respect such things, and handle them accordingly. It is not
- really much extra work to do so, it just requires a slightly different
- frame of mind when writing programs.
-
- 3. Notes.
-
- ╖ In order to set up locales on my machine I had to upgrade a few
- things. Apparently ftp.tu-clausthal.de:/pub/linux/SLT/nls contains
- a a.out version of locale and localedef (in the file
- nlsutils-0.5.tar.gz), so if you don't have an ELF system, or don't
- want one you can use the above. There is probably a copy of the
- nlsutils package some other place, but I have not looked for it. I
- hadn't known that there was a stand alone version of locale and
- localedef, and kind of figured that you would have to have the
- corresponding libc installed. Because of this a lot of this HOWTO
- is just a log of what I had to do to upgrade libc and family. If
- you do this, as I have you, will need to be running an ELF system,
- or upgrade to one as you set up your locales.
-
- ╖ The sorts of system upgrades that I did are the same sort of
- upgrades that have to be done to upgrade from a.out to ELF. If you
- haven't done this, or if you have upgraded to ELF by reinstalling
- Linux then you should get the resent ELF HOWTO from a sunsite
- mirror. This is an excellent guide, and gives additional guidance
- for installing libc, ld.so, and other ELF system upgrades.
-
- ╖ For anything that you install, read the appropriate release notes,
- or README type files. If you mess up your system by
- misinterpreting something that I say here, or ( hopefully not ) by
- doing something that I say in here, please don't blame me.
-
- ╖ Mis-installing a new libc, and ld.so, could leave you with an
- unbootable system. You probably ought to have a boot disk handy,
- and make sure any critical, non-replaceable, data is backed up.
-
- 4. What you need.
-
- A few things need to be down loaded from various places. Everything
- here except for the locale source files can be obtained from
- sunsite.unc.edu, tsx-11.mit.edu, or, preferably, a local mirror of
- these sites. When I did this originally I used libc-5.2.18, which is
- now quite out of date. As of now I have been told that the current
- libc is 5.4.17, and this substitution has been made below. However,
- libc 5.4.17, will likely be old before you can blink, so just use the
- lastest version when you do this.
-
- You may want to consider using glibc (gnu libc) rather than Linux libc
- 5 for any internationalization work. As of now glibc 2.0.4 (gnu libc)
- is available but no distributions have started using it as the
- standard libc yet (at least for Intel based Linux distributions). As
- well as being fully reentrant and having built in threading support,
- glibc is fully internationalized and has excellent
- internationalization support for programming. What
- internationalization has been done in libc 5 has been mostly taken
- from glibc. The locales and charmaps for glibc are bundled with the
- the glibc locale add on.
-
- If you opt for using glibc then you can ignore this mini-howto.
- Including the locale add on in the glibc compilation and installation
- is trivial, and is covered in the glibc installation documentation.
- Be warned that a full upgrade is not a trivial job! I am hoping that
- redhat (which I use) will have a glibc based release soon, as I am not
- inclined to recompile my entire system.
-
- ╖ locale, and charmap sources --- These are what you compile using
- localedef.
-
- ╖ libc-5.4.17.bin.tar.gz --- the ELF shared libraries for the c and
- math libraries. Note that the precompiled program localedef for
- libc.5.4.17 is apparently corrupt and creates LC_CTYPE with invalid
- magic number. This probably means that an older localedef got into
- the binary distribution.
-
- ╖ libc-5.4.17.tar.gz --- the source code for the ELF shared
- libraries. You may need this to compile localedef.
-
- ╖ make-3.74.tar.gz --- you may need to compile make to incorporate a
- patch to fix the dirent bug.
-
- ╖ release.libc-5.2.18 --- these release notes have the patch to make
- make. it's been a while since this make bug happened, and it is
- likely that you don't have to worry about it.
-
- ╖ ld.so-1.7.12+ --- the dynamic linker.
-
- ╖ ELF gcc-2.7.2+ --- to compile things.
-
- ╖ an ELF kernel ( eg. 2.0.xx ) --- to compile things.
-
- ╖ binutils 2.6.0.2+ --- to compile things.
-
- There are probably lots of places that you can get locale sources. I
- have found public domain locale and charmap sources at
- dkuug.dk:/i18n/WG15-collection/locales
- <ftp://dkuug.dk/i18n/WG15-collection/locales> and
- dkuug.dk:/i18n/WG15-collection/charmaps
- <ftp://dkuug.dk/i18n/WG15-collection/charmaps> respectively.
-
- 5. Installing everything.
-
- This is what I did to install everything. I already had an ELF system
- ( compiler, kernel, ... ) installed before I did this.
-
- 1. First I installed the binutils package. tar xzf
- binutils-2.6.0.2.bin.tar.gz -C /
-
- 2. Next I installed the dynamic linker:
-
- tar zxf ld.so-1.7.12.tar.gz -C /usr/src
- cd /usr/src/ld.so-1.7.12
- sh instldso.sh
-
- 3. Next I installed the libc binaries. See release.libc-5.4.17 for
- more instructions.
-
- rm -f /usr/lib/libc.so /usr/lib/libm.so
- rm -f /usr/include/iolibio.h /usr/include/iostdio.h
- rm -f /usr/include/ld_so_config.h /usr/include/localeinfo.h
- rm -rf /usr/include/netinet /usr/include/net /usr/include/pthread
- tar -xzf libc-5.4.17.bin.tar.gz -C /
-
- 4. Now ldconfig must be run to locate the new shared libraries.
- ldconfig -v.
-
- 5. There is a bug that was fixed in libc that breaks make, and some
- other programs. Here is what I did in order to rebuild and install
- make.
-
- tar zxf make-3.74.tar.gz -C /usr/src
- cd /usr/src/make-3.74
- patch < /whereever_you_put_it/release.libc-5.4.17
- configure --prefix=/usr
- sh build.sh
- ./make install
- cd ..
- rm -rf make-2.74
-
- 6. Now localedef can be compiled and installed.
-
- mkdir /usr/src/libc
- tar zxf libc-5.4.17.tar.gz -C /usr/src/libc
- cd /usr/src/libc
- cd include
- ln -s /usr/src/linux/include/asm .
- ln -s /usr/src/linux/include/linux .
- cd ../libc
- ./configure
- # I am not sure if these two makes are necessary, but just to be safe :
- make clean ; make depend
- cd locale
- make programs
- mv localedef /usr/local/bin
- mv locale /usr/local/bin
-
- 7. Put the charmaps where localedef will find them. This uses the
- charmaps and locale sources which I down loaded from dkuug.dk ftp
- site as charmaps.tar, and locales.tar respectively. The older
- localedef (5.2.18) looked in /usr/share/nls/charmap for charmap
- sources, but now localedef looks in /usr/share/i18n/charmaps and
- /usr/share/i18n/locales by default for the charmap and locale
- sources:
-
- mkdir /usr/share/i18n
- mkdir /usr/share/i18n/charmaps
- mkdir /usr/share/i18n/locales
- tar xf charmaps.tar -C /usr/share/i18n/charmaps
- tar xf locales.tar -C /usr/share/i18n/locales
-
- The newer localedef (5.4.17) has been made smarter and will look for
- other locale source files when handling the `copy' statement, whereas
- the older localedef needed to have the locale objects already created
- in order to handle the copy statement. This list of commands has the
- dependencies sorted out and can be used to generate all the locale
- objects regardless of which libc version is being used, but you should
- now be able to create only the ones that you wish.
-
- localedef -ci en_DK -f ISO_8859-1:1987 en_DK
- localedef -ci sv_SE -f ISO_8859-1:1987 sv_SE
- localedef -ci fi_FI -f ISO_8859-1:1987 fi_FI
- localedef -ci sv_FI -f ISO_8859-1:1987 sv_FI
- localedef -ci ro_RO -f ISO_8859-1:1987 ro_RO
- localedef -ci pt_PT -f ISO_8859-1:1987 pt_PT
- localedef -ci no_NO -f ISO_8859-1:1987 no_NO
- localedef -ci nl_NL -f ISO_8859-1:1987 nl_NL
- localedef -ci fr_BE -f ISO_8859-1:1987 fr_BE
- localedef -ci nl_BE -f ISO_8859-1:1987 nl_BE
- localedef -ci da_DK -f ISO_8859-1:1987 da_DK
- localedef -ci kl_GL -f ISO_8859-1:1987 kl_GL
- localedef -ci it_IT -f ISO_8859-1:1987 it_IT
- localedef -ci is_IS -f ISO_8859-1:1987 is_IS
- localedef -ci fr_LU -f ISO_8859-1:1987 fr_LU
- localedef -ci fr_FR -f ISO_8859-1:1987 fr_FR
- localedef -ci de_DE -f ISO_8859-1:1987 de_DE
- localedef -ci de_CH -f ISO_8859-1:1987 de_CH
- localedef -ci fr_CH -f ISO_8859-1:1987 fr_CH
- localedef -ci en_CA -f ISO_8859-1:1987 en_CA
- localedef -ci fr_CA -f ISO_8859-1:1987 fr_CA
- localedef -ci fo_FO -f ISO_8859-1:1987 fo_FO
- localedef -ci et_EE -f ISO_8859-1:1987 et_EE
- localedef -ci es_ES -f ISO_8859-1:1987 es_ES
- localedef -ci en_US -f ISO_8859-1:1987 en_US
- localedef -ci en_GB -f ISO_8859-1:1987 en_GB
- localedef -ci en_IE -f ISO_8859-1:1987 en_IE
- localedef -ci de_LU -f ISO_8859-1:1987 de_LU
- localedef -ci de_BE -f ISO_8859-1:1987 de_BE
- localedef -ci de_AT -f ISO_8859-1:1987 de_AT
- localedef -ci sl_SI -f ISO_8859-2:1987 sl_SI
- localedef -ci ru_RU -f ISO_8859-5:1988 ru_RU
- localedef -ci pl_PL -f ISO_8859-2:1987 pl_PL
- localedef -ci lv_LV -f BALTIC lv_LV
- localedef -ci lt_LT -f BALTIC lt_LT
- localedef -ci iw_IL -f ISO_8859-8:1988 iw_IL
- localedef -ci hu_HU -f ISO_8859-2:1987 hu_HU
- localedef -ci hr_HR -f ISO_8859-4:1988 hr_HR
- localedef -ci gr_GR -f ISO_8859-7:1987 gr_GR
-
- 6. Now what.
-
- After doing all the stuff above you should now be able to use the
- locales that have been created. Here is a simple example program.
-
- /* test.c : a simple test to see if the locales can be loaded, and
- * used */
- #include <locale.h>
- #include <stdio.h>
- #include <time.h>
-
- main(){
- time_t t;
- struct tm * _t;
- char buf[256];
-
- time(&t);
- _t = gmtime(&t);
-
- setlocale(LC_TIME,"");
- strftime(buf,256,"%c",_t);
-
- printf("%s\n",buf);
- }
-
- You can use the locale program to see what your current locale
- environment variable settings are.
-
- $ # compile the simple test program above, and run it with
- $ # some different locale settings
- $ gcc -s -o Test test.c
- $ # see what the current locale is :
- $ locale
- LANG=POSIX
- LC_COLLATE="POSIX"
- LC_CTYPE="POSIX"
- LC_MONETARY="POSIX"
- LC_NUMERIC="POSIX"
- LC_TIME="POSIX"
- LC_MESSAGES="POSIX"
- LC_ALL=
- $ # Ho, hum... we're using the boring C locale
- $ # let's change to English Canadian:
- $ export LC_TIME=en_CA
- $ Test
- Sat 23 Mar 1996 07:51:49 PM
- $ # let's try French Canadian:
- $ export LC_TIME=fr_CA
- $ Test
- sam 23 mar 1996 19:55:27
-
- 7. catopen bug fix.
-
- Installing the locales fixes a bug (feature ?) that is in the catopen
- command in Linux libc. Say you create a program that uses message
- catalogs, and you create an German catalog and put it in
- /home/peeter/catalogs/de_DE.
-
- Now upon doing the following, without the de_DE locale installed :
-
- export LC_MESSAGES=de_DE
- export NLSPATH=/home/peeter/catalogs/%L/%N.cat:$NLSPATH
-
- the German message catalog does not get opened, and the default mes¡
- sages in the catgets calls are used.
-
- This is because catopen does a setlocale call to get the right message
- category, the setlocale fails even though the environment variable has
- been set. catopen then attempts to load the message catalog
- substituting "C" for all the "%L"'s in the NLSPATH.
-
- You can still use your message catalog without installing the locale,
- but you would have to explicitly set the "%L" part of the NLSPATH like
-
- export NLSPATH=/home/peeter/catalogs/de_DE/%N.cat:$NLSPATH
-
- , but this defeats the whole purpose of the locale catagory environ¡
- ment variables.
-
- 8. Questions and Answers.
-
- This section could grow into a FAQ, but isn't really one yet.
-
- 8.1. msgcat question
-
- I am an user of LINUX, and have written the following test program:
-
- --------------------------------------------------------------------
- #include <stdio.h>
- #include <locale.h>
- #include <features.h>
- #include <nl_types.h>
-
- main(int argc, char ** argv)
- {
- nl_catd catd;
-
- setlocale(LC_MESSAGES, "");
- catd = catopen("msg", MCLoadBySet);
- fprintf(stderr,catgets(catd, 1, 1, "locale message fail\n"));
- catclose(catd);
- }
- --------------------------------------------------------------------
- $ msg.m
- $set 1
-
- 1 locale message pass\n
- --------------------------------------------------------------------
-
- If I use absolute path in catopen like
- catopen("/etc/locale/msg.cat",MCLoadBySet); ,I got the right result.
- But,if I use above example,catopen return -1 (failure).
-
- 8.2. msgcat answer
-
- This question is sort of answered in the previous section, but here is
- some additional information.
-
- There are a number of valid places where you can put your message
- catalogs. Even though you may not have NLSPATH explicitly defined in
- your environment settings it is defined in libc as follows :
-
- $ strings /lib/libc.so.5.4.17 | grep locale | grep %L
- /etc/locale/%L/%N.cat:/usr/lib/locale/%L/%N.cat:/usr
- /lib/locale/%N/%L:/usr/share/locale/%L/%N.cat:/usr/
- local/share/locale/%L/%N.cat
-
- so you if you have done one of :
-
- $ export LC_MESSAGES=en_CA
- $ export LC_ALL=en_CA
- $ export LANG=en_CA
-
- With the NLSPATH above and the specified environment , the
- catopen("msg", MCLoadBySet); should work if your message catalog has
- been copied to any one of :
-
- /etc/locale/en_CA/msg.cat
- /usr/lib/locale/en_CA/msg.cat
- /usr/lib/locale/msg/en_CA
- /usr/share/locale/en_CA/msg.cat
- /usr/local/share/locale/en_CA/msg.cat
-
- This, however, will not work if you don't have the en_CA locale
- installed because the setlocale will fail, and "C" will be substituted
- for "%L" in the catopen routine ( rather than "en_CA" ).
-
- 9. More information.
-
- Well that's it. Hopefully this guide has been some help to you.
- There are probably lots of places that you can look for additional
- information on writing locale sensitive programs, and documents on
- internationalization, and localization in general. I'll bet that if
- you browse the web a bit you will be able to find a lot of info.
- Ulrich Drepper who implemented much of the gnu internationalization
- code has some information about internationalization and localization
- on his home page <http://i44www.info.uni-karlsruhe.de/~drepper>, and
- you can look there to start. There is also some information in the
- info pages for libc, and of course, there are always man pages.
-
-