In short, I suggest
simply erasing or re-setting the NLSPATH, unless you have a trusted user
supplying the value.
For the Accept-Language heading in HTTP (if you use it),
form values specifying the locale, and the environment variables
LANGUAGE, LANG, the old LINGUAS, LC_ALL, and the other LC_* values listed
above,
filter the locales from untrusted users to permit null (empty) values or
to only permit values that match in total this regular expression
(note that I've recently added "="):
[A-Za-z][A-Za-z0-9_,+@\-\.=]* |
I haven't found any legitimate locale which doesn't match this pattern,
but this pattern does appear to protect against locale attacks.
Of course, there's no guarantee that there are messages available
in the requested locale,
but in such a case these routines will fall back to the default
messages (usually in English), which at least is not a security problem.
If you wish to be really picky, and only patterns that match li18nux's
locale pattern, you can use this pattern instead:
^[A-Za-z]+(_[A-Za-z]+)?
(\.[A-Z]+(\-[A-Z0-9]+)*)?
(\@[A-Za-z0-9]+(\=[A-Za-z0-9\-]+)
(,[A-Za-z0-9]+(\=[A-Za-z0-9\-]+))*)?$ |
In both cases, these patterns use POSIX's extended (``modern'')
regular expression notation (see regex(3) and regex(7) on Unix-like systems).
Of course, languages cannot be supported without a
standard way to represent their written symbols, which brings
us to the issue of character encoding.