Java Webapp localization through Gettext

GNU HeadIntroduction

As you may know, native Java localization support is shaky at best. ResourceBundles, which are just straight Java classes, are compiled natively starting from “.properties” files; these are simple key-value files containing an abstract name for each string along with its associated translation.

Although this sounds great (“if support for l10n is already built in the language, why using an external tool?” – you may ask), some severe limitations exist. Treatment of singular-plural forms is bound to give you an headache — especially if the target language has more than two plural forms (like Polish, Romanian, Russian and many others), and .properties files make both difficult to have immediately understandable code to the developer, and enough informations for the translator.

On the top of it, tools for extracting strings from code, merging new ones and manage old translations aren’t even nearly as advanced as those of GNU Gettext.

Thus, in this article we will see how to prepare a little webapp for l10n support. We will see how to manage things both using JSP and a bit of “normal” Java code. We will firstly set up the webapp, then proceed marking strings, finally translating and compiling them into native ResourceBundles.

Making the needed libraries available

If you’ve not already created a web application, go on and do so now. In your $(YOUR_JSP_SERVER)/webapps/ directory, make a new folder, for example “gettext-sample“. Then, create the other needed sub-directories: gettext-sample/WEB-INF/classes, gettext-sample/WEB-INF/lib, gettext-sample/WEB-INF/conf.

The only library you need is libintl.jar. If you’re on GNU/Linux, chances are that you’ve already got it installed on your filesystem, presumably in /usr/share/java/. Else, you can download it directly, along with all the other gettext utilities you’ll need, from the most recent gettext package you’ll find on a GNU mirror.

Once you downloaded or located it, put the libintl.jar library into the WEB-INF/lib sub-directory. All done!

Writing an helper class

It’ll be much convenient if you create an helper class with some functions you’ll use throughout your app. Feel free to put it in the package of your choice. So, create a Translation.java file e.g. in WEB-INF/classes/a/package/of/your/choice, and put into it something on the lines of:

package a.package.of.your.choice;

import gnu.gettext.GettextResource;
import java.util.ResourceBundle;
import java.util.Locale;
import java.text.MessageFormat;
import java.util.Hashtable;

public class Translation
{
   private static Hashtable<Locale, ResourceBundle> trht =
        new Hashtable<Locale, ResourceBundle> ();

   private ResourceBundle myResources = null;

   public Translation (Locale locale)
   {
      synchronized (trht)
      {
         if (!trht.contains (locale))
         {
            try
            {
               myResources = GettextResource.getBundle ("translation",
                                                        locale);
            }
            catch (Exception e)
            { /* Do nothing */ }

            trht.put ((Locale) locale.clone (), myResources);
         }
         else
            myResources = trht.get (locale);
      }
   }

   public String _(String s)
   {
      if (myResources == null) return s;
      return GettextResource.gettext (myResources, s);
   }

   public String N_(String singular, String plural, long n)
   {
      if (myResources == null) return (n == 1 ? singular : plural);
      return GettextResource.ngettext (myResources, singular,
                                       plural,      n);
   }

   public String format (String s, Object ... args)
   {
      return MessageFormat.format (_(s), args);
   }

   public String formatN (String singular, String plural, 
                          long n, Object ... args)
   {
      return MessageFormat.format (N_(singular, plural, n), args);
   }
}

Notice how we set the baseName of our GettextResource to “translation“. If you want to use another domain name, pay attention to change it also into the Makefile that is presented further down.

We’ll use the short-named “_ ()” and “N_ ()” functions to mark our strings for translation. As you can see, the first one accepts a simple string and returns that string in the target language if possible, the second one permits you to specify two different strings: one for singular and one for plural forms.

I added also a couple of extra methods for your commodity: namely format () and formatN (). These take java strings containing position-holders, and the parameters to replace in them. It makes everything much more easier for the final translator, expecially in order to avoid HTML tags in strings; it also allows for switching the order of the parameters in the translated strings (think about the British date format, “January, 4th” and other formats like “4 Gennaio” in Italian). These methods’ behaviour is roughly analogue to that of C’s sprintf (). See MessageFormat documentation for more details.

You may wonder why I made all that mess using an Hashtable and only allow for instance-bound methods instead of static ones, whereas the GNU Gettext Manual and the Autoconf one advise you to statically import the functions into your code.

The reason being, JSP pages and contexts use a threaded model in most implementations (like in Apache Tomcat), so if a server tries to serve pages in two different languages at the same time… bang! You’re dead. You wouldn’t like to have a page half in Hindi, half in French, interspersed with Italian and Russian, would you?

Of course, if you’ve a normal Java application, feel free to use static methods: if the user is just one, or all users are of the same nationality, no harm will occur.

Preparing the rest of your code

Herein I’ll use JSP snippets to speed things up. Consider using taglibs. I’ll show you how to prepare a .jsp page, since for Java code the editing should be fairly trivial and on the same lines.

I won’t do any Locale negotiation based on the server Accept-Language header and the locales specified in the user-provided request. You may plan to ask the user specifically what language to use, and then store that in a cookie or session variable, or do the negotiation yourself.

Take a look at this code:

<%@page language="java" pageEncoding="utf-8"
        contentType="text/html;charset=utf-8"
%><?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<%@page import="a.package.of.your.choice.Translation" %>
<%
   /* We're not going to do any locale negotiation here...
    * but you should! Else you could store prefs in a cookie
    */
   java.util.Locale locale = request.getLocale ();

   Translation t = new Translation (locale);
%>
<html xml:lang="<%= locale %>" lang="<%= locale %>">
<head>
  <title><%= qh.getTitle () %></title>
  <meta equiv="Content-Type"
        content="application/xhtml+xml; charset=utf-8">
</head>

<body>

  <!-- Herein an example using both gettext in a straight manner,
       and with the Translation::format () function - a variant
       of C's printf (). -->

  <div id="summary">
    <%
       String username = request.getUserPrincipal () == null ?
                         /* TRANSLATORS: This is the anonymous user,
                          * unknown by the system. */
                         t._("Anonymous") :
                         request.getUserPrincipal ().getName ();
    %>
    <input type="hidden" name="username" value="<%= username %>" />
    <%= t.format ("You're logged in as user {0}.",
                  "<em>" + username + "</em>") %>
  </div>
</body>

</html>

Writing a Makefile

If you’re an Ant guru, use your knowledge for the best… I’ll stick with a common Makefile, thanks. This is a quick-and-dirty one, conjured in less than five minutes; it could use some love, but for the sake of our example, it’ll do. So, here goes WEB-INF/Makefile:

.PHONY: all clean update-l10n update-pot

JAVA_SOURCEDIR = classes/a/package/of/your/choice

JAVA_SOURCES = \
  $(JAVA_SOURCEDIR)/Translation.java

JSP_PAGES =             \
  ../Sample.jsp         \
  ../AnotherPage.jsp

LANGUAGES = \
  en        \
  it

TPL_FILE = conf/messages.pot

POFILES   = $(addprefix conf/, $(addsuffix .po, $(LANGUAGES)))

# ----------------------------------------------------------

all: update-l10n $(JAVA_SOURCES)
  javac -classpath lib/libintl.jar $(JAVA_SOURCES)

clean:
  find classes/ -name "*.class" -delete

update-l10n: update-pot $(POFILES)
  for i in $(LANGUAGES); do \
    msgfmt --java2 -r "translation" -l $$i "conf/$$i.po" \
                   -d classes/;
  done

update-pot: $(JAVA_SOURCES) $(JSP_PAGES)
  xgettext --from-code utf-8 -L Java --force-po \
    --keyword=_ --keyword=N_                    \
    --keyword=format --keyword=formatN          \
    --add-comments=TRANSLATORS                  \
    -o conf/messages.pot $^

# ---------------------------------------------------------

conf/%.po: $(TPL_FILE)
  if [ '$*' = 'en' ]; then msgen -o $@ conf/messages.pot; \
  else msgmerge -U $@ $<; fi

You should setup your .po files with the initial (empty) translation by running msginit manually. The “en” language catalogue is considered here to be the default one, so it’s regenerated every time with the identity function (e.g. the translated string is a copy of the string to translate) by using msgen. You may want to change this, of course.

So, go now into the WEB-INF/conf/ sub-directory, and for each language different than English, use “msginit -l $(YOUR_LL_CODE)“.

Also, notice how we pass to xgettext the symbol names of the functions we use for translation with the –keyword option. This is necessary so that it can recognize all strings marked for localization correctly, and hence, when run, it puts them in the messages.pot file. You don’t have to edit this file directly; instead, init (with msginit) a catalogue for a specific language and then edit the respective .po file.

Finally, you can include all comments in the output .po file preceding keywords using the “-c” parameter of gettext without additional specifications. Here, instead, I used the “TRANSLATORS” tag in order to differentiate between comments targeted to programmers and those targeted to translators. This is quite important in order to help assess the message’s context, and makes everyone’s life much easier.

Translating the strings

Finally!

Once you created your  .po files in WEB-INF/conf with msginit, just translate all the “msgstr“s with the editor of your choice. I recommend GNU Emacs or, if you can’t wrap your mind around it, GTranslator.

For a description of the .po file format, see the manual. Also notice that comments just preceding specified keywords will be extracted by xgettext and then reported to the translator in comment lines starting with “#.“, by passing it the “-c” parameter. This is a great way to add contextualized information, especially if related to positional parameters embedded in translatable strings.

Then, firing up make again will create the relevant Java .class files in the WEB-INF/classes directory.

Restart your webserver (or enable dynamic class reloading) and you’re done! You can test if it works by changing the preferred language for displaying content of your web browser. For example, in Firefox 3.x, it’s in “Preferences” » “Contents” » “Languages” subsection » “Choose” button.

Cheers!
Matteo Settenvini

5 Responses to Java Webapp localization through Gettext

  1. Matteo Settenvini says:

    Just a quick note:

    Monolingual translation formats are considered harmful:
    http://translate.sourceforge.net/wiki/guide/monolingual

    And it’s perfectly true.

  2. Lokesh Bhatt says:

    Hi,

    How we can identify programmatically that the PO file is monolingual or bilingual??

    Thanks for your kind support.

    • tchernobog says:

      Hi Lokesh,

      All PO files should be bilingual for the benefit of the translator; they are not immediately in use by the application (your program does not read them), instead they are compiled in .mo files by gettext – for performance purposes, mostly.
      So from your application you would not do anything with them – in fact, you are not even forced to distribute them (even though I do not recommended this for customization purposes on the end-user side).

      Maybe I did not understand the question right?

      Cheers,
      Matteo

  3. Do Xa says:

    Hi,
    If you’re interested to localize software which uses .po language files, I warmly recommend POEditor, a localization platform that my team developed not long ago.
    POEditor has an intuitive work interface, which makes it easy to use for technically inexperienced translators. It is perfect for crowdsourcing and it has a lot of management-oriented features, that give full control over the translation workflow, which you can find enlisted on our website here:
    http://poeditor.com/
    The platform supports multiple popular localization file formats, not just .po files. API is also available.
    Feel free to try it out and recommend it to developers and everyone who might find it useful.

  4. Bernie says:

    This post is worth everyone’s attention. Where can I find out more?|

Leave a reply to tchernobog Cancel reply