Internationalization

2 January 2008

And now for something completely different…

In this day of multiple languages, with computer users from all over the world (and sometimes found in the same office building), it is worthwhile to set up systems with internationalization. Basically, this means that the system is capable of handling eight-bit characters, foreign character input and output.

On a UNIX or Linux system, this is handled by a number of variables that all expand on the LANG environment variable. The current settings can be seen quickly by using the locale command. This is the output from my Fedora 7 system:

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Each of these can be set seperately. The individual LC_* environment variables override LC_ALL (in specific areas), and all LC_* commands override LANG. There is an excellent and full-depth description of these variables available.

All of the possible values for these variables can be seen with the locale -a command. Here are a few selected entries from the long list (654 locale settings on Fedora 7!) that erupts from this command:

$ locale -a

ca_FR
ca_FR.iso885915
ca_FR.utf8

en_US
en_US.iso88591
en_US.iso885915
en_US.utf8

ru_RU
ru_RU.iso88595
ru_RU.koi8r
ru_RU.utf8

The first example is Canadian French; the second, American English; the third, Russian from Russia (as compared to Russian from Ukraine or Tatar from Russia). The code used is ISO 639-1 for the language (first two characters), ISO 3166-1 for the country codes (second set of dual characters), and then the name of the specific encoding to use. In the example, one can see ISO 8859-1 (Latin 1), ISO 8859-15 (Latin 9), ISO 8859-5 (Cyrillic), UTF-8 (Unicode), and KOI8-R (Cyrillic). The traditional “C” locale is represented by the name C or its equivalent POSIX – both of which refer to the traditional 7-bit ASCII representation. The best choice would be utf8 (such as en_US.utf8) as ISO 8859 disbanded and is no longer maintained today.

It is also necessary to make sure that the connection between your display and the system is what is called “eight-bit clean” – that is, all eight bits from the source system to your display are preserved and are intact. More specifically, the entire path from keyboard to display must be eight-bit clean in order for things to work properly.

These variables set the main character sets to use; however, programs must still be translated into other languages and must be prepared to handle the language in question. If a program is not translated into Russian, using ru_RU.utf8 will not make a difference in the output (which most likely will be English). Some programs may even have to be configured for a different language.

There is also the keyboard mapping – which can be a different set of challenges and configurations to handle. Linux has the xmodmap (for X) and loadkeys commands. The console keymap programs are included in the kbd RPM (or package).

Entry Filed under: Linux, UNIX. Tags: , , , , .

1 Comment Add your own

  • 1. Beth Potts  |  17 January 2008 at 6:23 pm

    I localized my site a little while ago and I can’t even begin to say how much it increased my sales and overall traffic. I have conversisglobal.com to thank that for. I found them through a friend of mine and they have changed my business for the better.

    Reply

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed


David Douthitt

David is an experienced UNIX and Linux system administrator, a former Linux distribution maintainer, and author of two books ("Advanced Topics in System Administration" and "GNU Screen: A Comprehensive Manual"). View David Douthitt's profile on LinkedIn Support freedom The Internet Traffic Report monitors the flow of data around the world. It then displays a value between zero and 100. Higher values indicate faster and more reliable connections.

Recent Posts

Top Posts

RSS Sharky’s Column!

Calendar

January 2008
M T W T F S S
« Dec   Feb »
 123456
78910111213
14151617181920
21222324252627
28293031  

Recent Comments

Peter on Using Open Source in the Enter…
Anthony on About
MikeT on Stress Relief: Laugh Out Loud…
yungchin on Sparse files – what, why…
Randal L. Schwartz on Perl Tidbits: Annoyances and…

Category Cloud

BSD Career Conferences Debian Debugging Disaster recovery Fedora FreeBSD HP-UX Legal Linux MacOS X Mobile Computing Networking OpenBSD OpenSolaris OpenVMS Personal Notes Portable Code Presentations Productivity Programming Red Hat Scripting Security Solaris Storage Tips Ubuntu UNIX

Archives

Feeds

Blogroll

Pages

Meta