On Internationalization and Localization

Internationalization and localization are two important parts of software development and design. Unfortunately, it’s an area that’s often neglected or not fully understood. This article gives an introduction into the fields of I18N and L10N along with many examples and useful resources.

Pascal Birchler,

At the recent Drupal Mountain Camp in Davos, Switzerland, I gave a presentation on the importance of internationalization and localization in software development. Despite being the only WordPress person at the Drupal community event, the talk was very well received. That’s why I decided to write down my thoughts in a blog post.

(Hat tip to Amazee Labs for the picture)

As you might already know, internationalization and localization has to do with languages and translations. But why should you care? Well, the answer is quite simple: your users may not speak English. And if they do, it may not be their native language.

When I started building websites I eventually heard of the term internationalization and its place in software development. I learned about internationalization tools that were created almost 30 years ago to solve the same problems I was struggling with.

Along the way, I encountered many misconceptions and wrong assumptions like that everyone understands English or that they just need to translate their website and that’s it. However, there are a few things wrong with that.

A Short Trip to Bulgaria

Before I go on to explain things, I want to start with a story about the beautiful country of Bulgaria. I’ve been to Bulgaria twice so far. Before I went there the first time, I did what most people would do: I googled it.

I learned a few things about the history, the language, currency, the capital, and so on. But only recently, after travelling back to Switzerland, I found out about one special thing:

In Bulgaria, nodding means no, shaking your head means yes.

Wait a second… that’s not how I learned this stuff! That’s the exact opposite of what we’re used to in Switzerland and many other parts of the world.

Sofia, Bulgaria
Sofia, Bulgaria.

Luckily, I cannot remember a situation where I was nodding or shaking my head and something unexpected happened. Otherwise, it would have been really confusing.

In Bulgaria, nodding means no, shaking your head means yes.

However, that experience made me realise how important culture is when communicating in another language. In other words, adapting this to our profession, when building software we need to do more than just “making things translatable”.

Internationalization vs. Localization

Going back to the introduction and the title of this post, I find this definition of the term internationalization very on point:

Internationalization is the process of creating a product in such a way that it can be easily adapted to specific local languages and cultures

The next step after internationalizing a product is to localize it:

Localization is the process of adapting a product or service to a particular language, culture, and desired local “look-and-feel.”

In other words: localization is like translation but with a cultural twist.

Note that most often you’ll hear about these terms in their abbreviated forms, I18N and L10N, respectively.

Locales

As I mentioned before, internationalization is mainly about culture. There are two important influences on our culture: language and geographic environment.

They shape our lives and how we present ourselves to others. To communicate effectively with another person, we must consider their cultural environment.

Similarly, software should respect its users’ language and geographic region to be effective.

Language and region form a locale, which represents the target setting and context for localized software. So whenever we talk about L10N and I18N, it’s about locales, not just languages.

For example, English spoken in the US (en-us) is different to the one in Australia (en-au) or Great Britain (en-gb). (not necessarily better, but different)

The same goes for French (fr-fr, fr-ca, etc.) or German (de-ch, de-de, de-at), of course. These are even more special, as there are formal and informal variants that we need to distinguish between.

Flags ≠ Languages

This leads me to my next point: flags do not represent languages.

If you have a language switcher on your web site, do not use flags. Why? So, flags are symbols. They represent countries or nations.

Languages, on the other hand, represent a shared method of communication between people. Languages don’t have any borders. By using a flag for a language, you may confuse or even offend users. For example, as a British person you don’t want to see the US flag being used to refer to the English version of a website, just like I as a Swiss person don’t want to see a German flag being used to refer to the German website of a local website.

If you don’t believe me and want some more information about this, there’s even a dedicated website on the topic: flagsarenotlanguages.com.

Dates & Units

Locales deal with many different aspects like date formats, length of weeks, which day the week starts on, units of measurement, and so on.

These really aren’t the same in every country and it’s worth keeping this in mind. For example, you don’t want to send your weekend celebration newsletter to someone who’s in the middle of their workweek.

The most annoying part about dates is certainly formatting. Is it DD/MM/YYYY? MM/DD/YYYY? Or even YYYY-DD-MM? Without that context, we don’t know if a date like 11/05/2017 means May 11th or November 5th.

Date format by country
Date format by country. Green: MM/DD/YYYY. Brown: DD/MM/YYYY. (source: Wikipedia)

Don’t believe me that this is still a real problem today? Here’s a recent example:

https://twitter.com/jimwaterson/status/825017459530936320

You get the point.

Just note that there’s so much more to the picture, like numbers, currencies, formal/informal variants, and even humour.

Internationalization in User Interfaces

As we’ve seen, internationalization affects all of us. And it affects a lot. This means we must already consider it when tinkering with mockups and prototypes.

This is usually the area that’s neglected the most and I18N only comes after the fact. But having to redesign your application when it turns out to be unusable in different languages sucks and is expensive. You don’t want to run into this.

Symbols

While there are some symbols that usually work around the world, like a magnifying glass for search or a pencil for editing something, symbols can be culture-specific as well.

For example, for most of us ✅ means correct or OK. In some Eastern Asian countries like Japan, however, it can be used to mean that something is incorrect. Their symbol for correct is actually a ⭕️.

This needs to be accounted for when designing a user interface. Perhaps these icons might even need to be swapped during localization.

Personal Names

My first name is Pascal, my last name is Birchler. That’s dead simple to support in an application. However, personal names can be way more complex.

In some countries the name is usually written with the family name first, so exactly the other way around. Some cultures might not even have typical family names or first names.

Let’s take a name like David Lloyd George. Is Lloyd part of the given name or the family name? Without some cultural context, you don’t know for sure.

Many European names have prepositions. They’re usually part of the family name, but for example in French there are exceptions where particles are omitted in some cases. When you have a UI for sorting people by family name, you need to account for such prepositions.

Implications for Field Design

Now, how do you account for all these differences? One possible approach is to localize forms for a particular culture, tailoring them exactly to the needs of the audience. Unfortunately, that’s rather expensive and doesn’t solve the storage part.

Before you go that route, ask yourself: do I really need to have separate fields for given name and family name? Perhaps one field for the full name is already enough.

A single form field for the full name
Do you really need two form fields for a person’s name?

Let’s say we stick with two fields. How do you name them? Labels like first name and last name are confusing when you normally write your family name first. Thus, labels like family name and given name should be preferred.

Two form fields for family and given name
Ask for the user’s family name and given names

Another possibility is to ask the user how they want to be called, for example during registration.

One free-form text field for the user's name
Let the user tell you how they want to be called

For more examples, check out the W3C article about Personal names around the world.

Account for Translations

Another thing that needs to be considered is that the length of the source and translated text is likely to be different.

Here’s an example I took from Airbnb:

Reviews section on Airbnb

Super clean, isn’t it? You can easily see the total rating and the average rating for each sub-section. Now, let’s have a look at the German UI:

Reviews section on Airbnb in German

Whoops! Airbnb didn’t consider that German words tend to be longer than in English. Or perhaps they did consider, but they just don’t care.

As a user, I had to look twice which section the stars were for. Is there a label missing on the left? Is there no rating for accuracy?

Left-to-Right vs. Right-to-Left

In certain languages like Arabic and Hebrew, text is read from right-to-left, which basically means your entire design needs to be flipped. A modular design approach will come in handy for this.

Also, you’d need a Unicode-compliant font that supports a wide range of characters. A good example is the Noto font family developed by Google, which aims to support all languages in the best way possible. No matter if it’s English, Chinese, or Arabic.

Pro Helvetia is a good example of a website with both LTR and RTL support.

Luckily, supporting RTL languages in web design has become amazingly easy in the last few years. For example, the open-source RTLCSS library drastically improves RTL compatibility by basically flipping CSS rules, e.g. it turns all usages of float: left; into float: right;. At the end, you only have to do some fine-tuning.

Internationalization in Your Code

After covering some important UI aspects of internationalization, let’s have a look at the code side of things.

A good rule of thumb is to internationalize all the things. When not every part of your product can be localized, it will result in a bad experience for both translators and users. So it really starts with the UI and needs to be continued throughout the code base.

String Extraction

An important thing to remember when internationalizing your code is that the tools generating translation files based on that will not run your code.

They extract translatable text using pattern matching. This means that you can’t use variables in your translatable text as the tools will not know what the variable values are and therefore can’t properly parse the text.

String Concatenation

In addition to not using variables, you should never concatenate strings. Appending one string to another always results in a bad localization experience. After all, sentences aren’t formed the same way in every language. Here’s a quick example:

echo $num . ' ' . __( 'items' )

As you can see, the number of items is separated from the translatable string. Now, for someone trying to localize this, there’s no chance they can differentiate between singular and plural forms.

But it’s not just about variables. Punctuation is another thing, e.g. in French there are usually non breaking spaces before colons and question marks, which means these need to be translatable as well.

Plural Forms

Sticking with the items example, let’s explore singular and plural forms. Let’s say we fix the string concatenation issue by using printf() and a placeholder in the text.

printf( __( '%d items' ), $num )

Then we want to know about singular and plural forms, using a simple condition:

if ( 1 === $num ) {
    printf( __( '%d item' ), $num )
} else {
    printf( __( '%d items' ), $num )
}

Perfect, right? Well, no. While this might work in German or English, in some languages counting is a bit more complex than that. In fact, the Mozilla Developer Network lists 16 different plural form rules.

That’s why there are usually dedicated functions for translating plural forms (e.g. _n() in WordPress). When extracting the strings from there, this provides the necessary context for translators. They know it’s a string that has a singular and a plural form and the system can display the correct translation based on the amount.

Context

By default, translated strings are only translated once, no matter where  and how often they are being used. In some cases, however, you need to translate a string differently. For example, the word post could refer to both the verb and the noun.

To address this problem, one can specify the context of a translated string. When using _x( 'Post', 'noun' ) and _x( 'Post', 'verb' ), the word will be saved twice in the translation system and translators can localize the verb and noun form accordingly.

Dates & Units

I mentioned date formatting and units of measurement units earlier. Here, it’s important to never ever hard code currencies, units, or date and time formats, not to mention timezones. These will always be different per locale.

Useful Resources

Now that we’ve explored some tips & tricks on I18N and L10N, I want to highlight a few useful resources on the whole topic.

The first one is rather obvious. No matter what system you use, read the internationalization guidelines from A to Z.

Best practices change from framework to framework and from version to version. Make sure that you stay up to date. This especially includes security-relevant things like escaping placeholders in your translatable text.

For WordPress, I highly recommend you to check out the How to Internationalize Your Plugin guideline.

After properly internationalizing your project, you want to use something to easily localize it. There are plenty of services for this, like Transifex, POEditor, or PhraseApp. If you prefer an open-source alternative, I can highly recommend the GlotPress WordPress plugin which is led by our very own Dominik Schilling and also powers WordPress’ own translation system, translate.wordpress.org.

To dig deeper into I18N, the W3C has a really thorough collection on the whole topic that is worth looking into.

They even have a dedicated I18N Checker that is similar to the HTML validator. It checks things like character encoding, language declarations and text directions, and provides usually helpful optimization suggestions.

The Common Locale Data Repository has tons of information as well. They list locale-specific patterns for formatting and parsing, language and country translations, and even information about keyboard layouts.