Pseudolocalization to Catch i18n Errors Early

Pseudolocalization is the automated generation of fake translations of a program’s localizable messages. Using the program in a pseudolocale generated in this manner facilitates finding bugs and weaknesses in the program’s internationalization. Google has been using this technique internally for some time, and has now released an open-source Java library to provide this functionality at http://code.google.com/p/pseudolocalization-tool/.

Take a look at the following screenshot and see if you can spot the internationalization problems:

Some common problems in localizing applications are:

  • Non-localizable text A program may have hard-coded text in the program source itself, or parts of messages may come from other non-localized sources (such as from a database), or there could be bugs in finding and using localized messages. The accents pseudolocalization method helps identify these problems by replacing US-ASCII characters with accented or otherwise modified versions, while still remaining readable. That way, if you see unaccented text, you know that text will not be localized for real locales either.
  • Combining separately translated sentences or paragraphs A program may “piece together” complete sentences/paragraphs from smaller translated strings. This is a problem since some locales may require that parts of the sentence/paragraph be reordered, or the translation may depend on the context of what else appears in the sentence. The brackets pseudolocalization method adds [ and ] around each translated string to clearly show translation boundaries. If you see a sentence/paragraph broken up into multiple bracketed pieces, you know it is likely to be impossible to translate well for some locales.
  • User interface elements don’t give sufficient space for translated text Some languages, such as German, frequently have translations that are much longer than the original message. If the user interface does not allow sufficient room for such languages, parts of it may appear wrapped, truncated, or have unsightly scrollbars in such locales. If the developer only tests with the original language, the developer may not discover these problems until late in the development cycle. The expander pseudolocalization method addresses this by making all the strings longer. Looking at the resulting pseudolocale will allow the developer to find such problems long before real translations are available and without requiring knowledge of other languages.
  • Improperly “mirrored” user interfaces in right-to-left locales Some languages like Arabic and Hebrew are written right-to-left. A user interface in a right-to-left (RTL) language needs to be laid out in a mirror image of a left-to-right layout. The only way to tell if this has been done properly is to try the user interface in an RTL locale. However, using a real RTL language is problematic both because translations are unlikely to be available until late in the development cycle and because few developers might be able to read any RTL languages.
Using untranslated LTR messages in an RTL user interface is problematic because it masks real problems and creates apparent problems where none actually exist. Just setting dir=”rtl” on the HTML element of an otherwise-English UI produces something like this:

Note how some of the punctuation is misplaced and not mirrored, the radio buttons are a mess, and the order of the menu items isn’t mirrored. None of these happen to be real problems — they will disappear when the English strings are replaced with real translations.

The solution is the fakebidi pseudolocalization method, which takes the original source text and adds Unicode characters to it to make it behave just like real RTL text while remaining readable (though backwards).

We find the most useful combinations of pseudolocalization methods to be accents/expander/brackets for finding general internationalization problems, and fakebidi for finding RTL-related problems. We use BCP47 variant subtags to identify locale names that get pseudolocalized translations: psaccent (as in en-psaccent) gets the accents/expander/brackets pseudolocalization, and psbidi (as in ar-psbidi) gets fakebidi. Note that for psbidi, using a real RTL language subtag is recommended since that will trigger RTL handling in most libraries/frameworks without any modifications. We hope to get these variant tags accepted as standard.

We have taken the sample application from above and run its translatable text through psaccent and psbidi pseudolocalization. Now take a look and see how much more easily internationalization problems can be identified:

Note that three problems have been revealed by the use of psaccent:

  • the space provided is insufficient for longer translations
  • “Add Contact” is split across two messages making it difficult to translate correctly
  • the button text has not been translated

We can fix those problems, then check for Bidi problems by using psbidi:
Notice this doesn’t introduce problems that aren’t there like the earlier example, but it does show that the Help menu item does not float to the left side of the window as it should.

Initially, this is just a library for use with other tools. We plan to write a command-line tool for taking message sources and producing fake translations of them. In addition, we are in the process of integrating this library with GWT, so GWT users can take advantage of it just by inheriting one module.

In summary, pseudolocalization is useful for finding internationalization problems early in the development process and enabling the developer to fix them before wasting money on translations that may have to be changed to fix the problems anyway. We hope you will use this library to help make your application usable by more people, and we welcome contributions and discussions at https://groups.google.com/forum/#!forum/pseudolocalization-tool.

GWT Goes Bidirectional!

Over 100 million web users speak languages that are written right-to-left, such as Arabic, Hebrew and Persian.

Right-to-left language support is not new to GWT. GWT makes it easy to build applications localized to both right-to-left (RTL) and left-to-right (LTR) locales, mirroring the layout of the page and its widgets so that an RTL-language page flows right-to-left. GWT even makes it easy to mirror those “handed” images that need to point in a direction that makes sense within the page. So, what’s the problem?

The Challenge of Bidi Text

An application localized to an RTL language is often used to access both RTL and LTR data, simply because data in some left-to-right languages like English is so widespread. For example, an Arabic movie review application may need to display Latin-script movie titles. Conversely, native RTL-language speakers often choose to use an English version of an application, but still use it to access and enter RTL data too. The point is that the data should be displayed in its own direction, regardless of the context’s direction. As a result, the text of the page as a whole goes in both directions; the page’s text is bidirectional, or “bidi”.

Unfortunately, inserting an opposite-direction phrase into your page without explicitly indicating where it begins and ends, often garbles it and/or the text surrounding it, putting the words and punctuation in the wrong order. For example, in an RTL context, “10 Main St.” is displayed incorrectly as “.Main St 10”. For short, we call the capability to display opposite-direction text correctly “bidi text support”.

The good news is that GWT has recently been enhanced with some powerful features for supporting bidi text. They allow your application to correctly display and enter opposite-direction text with very little extra effort.

Built-in Bidi Text Support in TextBox and TextArea

Since GWT 2.1 release, the TextBox and TextArea widgets are capable of automatically adjusting their direction as text is being entered (GWT Showcase example). This feature is enabled by default when at least one of the application’s locales is RTL, and otherwise can be enabled manually.

Built-in Bidi Text Support in Other Widgets

As of GWT 2.2 (to be released soon), several widely used widgets such as Label, HTML, Anchor, Hyperlink and ListBox gain built-in bidi text support by exposing two new interfaces:

  • The HasDirectionalText interface allows declaring the direction of text whose direction is known in advance. For example, a Label holding a phone number is always LTR, even in RTL locales, and must be declared as such to prevent garbling. You can do this by passing an extra parameter to either the constructor:
    Label label = new Label(phone, Direction.LTR);
    or to setText():
    label.setText(phone, Direction.LTR);
  • The HasDirectionEstimator interface allows widgets to cope with text whose direction is unknown (for example text previously entered by users). When direction estimation is enabled, the widget estimates the direction of the text content at runtime, and sets its display direction accordingly. Note that usually there’s no need to use this feature for boilerplate text (i.e. messages), since such text usually matches the overall page direction.

In the following example, an imaginary application collects reviews of books from the users. A user can add to the repository a new book with its name and review. To view the review of a specific book, one can select its name from a ListBox holding the books’ names. Note that books’ names may be of both directions, especially in RTL locales.

public class BookReviewApp() {
  private ListBox bookNamesListBox;
  …
  public BookReviewApp() {
    // Enable bidi support.
    bookNamesListBox.setDirectionEstimator(true);
    …
  }
  public addBook(String bookName, String review) {
    // ListBox item’s direction will be set automatically.
    bookNamesListBox.add(bookName);
    …
  }
  …
}

Bidi and Messages

BidiFormatter is a new class providing bidi text support primitives like estimating a string’s direction and wrapping it in HTML for correct display in a potentially opposite-direction context. The new built-in bidi text support features in the widgets mentioned above use BidiFormatter to do their stuff. Sometimes, however, you may need to use BidiFormatter directly. It is particularly useful with message placeholder values. While the message as a whole is usually localized to the user interface language, placeholder values may be in an arbitrary language and thus may have the opposite direction. Use BidiFormatter’s methods to wrap such values before inserting them into the message. A comprehensive example of using BidiFormatter with messages is available in the GWT Showcase.