Hints about i18n – Let your code shine all over the world

Developer handbooks usually tell you how to call the built-in functions for translation. But my experience from translating hundreds of WordPress plugins and themes (and from mentoring the “Polyglots” community several years) shows that a developer needs to know a few more things about i18n that may not be obvious if you don’t know a few different languages yourself.

First a word about these abbreviations, i18n and L10n. Internationalization and localization are such long and complicated words, that we often, lovingly, keep only their first and last letters, and replace all the remaining letters with their count; Internationalization becomes “i18n”, and localization is usually written as “L10n”.

“i18n”, which this article talks about, is the process of making it possible to localize your code. Of course, you need to call the translation functions in the right way, but here I’m going to talk more about how you can structure your code and strings in a way that allows translators to make really good translations or even localized adaptations of your code.

This is important. More than half of all WordPress sites in the world use a different locale (language version) than “US English”. If you want your code/plugin/theme to be useful all over the world, then you need to do this. And it’s best to do it well.

Use English, US English

Sometimes it can be OK to distribute a plugin in I different language than English. Perhaps you don’t know English well, and the use-case of your code is relevant to one single language. This could be the case for an adaptation to the Persian date system, or your plugin is specially coded for one single customer.

But if there’s any chance that your code will be used in more than one language, then the strings in your code should use “US English”.

The technical reason is that if a site administrator has selected “US English” as the site language, then WordPress won’t call any translation functions at all, – whatever strings you have put in your code will be used exactly as you wrote them. If you don’t know English well, ask someone to check your strings for clarity, meaning and typos.

There’s also a practical and organizational reason to use English in your code: In the WordPress community we have chosen English as our “common language”. Any “Polyglots” contributor is expected to be able to translate (or check translations) from English to a target language they know well. If your plugin uses, say, German, then the chance of someone translating it to Korean or Hindi would therefore be almost zero.

Give reasonably sized “chunks” as strings for translation

Splitting your text into strings of the right size is both easy and hard at the same time. Too big strings, like several paragraphs of text, are harder to translate (and harder to verify). The translator may get lost and miss something in the middle of a long string. Depending on the UI of the translation interface, not all the text in the source and target string may be visible on screen when the translator is working – this would make the work harder.

The translator’s work environment may need to reserve space for for translation suggestions from translation memory, existing translations to other languages, and/or machine translations, as well as context references, comments and perhaps some translation lookup features. Strings that are longer than 3-4 lines may therefore be trickier to handle for the translator.

Another argument against very long strings is use of translation memory. Let’s say you’ve got a string that explains five steps in a process, and this was already translated before. If you now correct a typo in one of the steps, this is technically a new string. However, the translation platform will hopefully present the previous translation as a “fuzzy match” so that the translator can reuse the previous translation. But they will be forced to carefully double-check the whole string, all five steps, before the string can be confirmed.

If, instead, you need to add one extra step in the description, so that it contains SIX steps, then this change is probably too big to become a “fuzzy match”. So, despite that five out of six steps already were translated before, the translator will need to translate everything again, from zero.

In this case, a good string size would be “one work step per string”. The corrected typo would also be easier to spot, allowing the translator to quickly confirm the fuzzy match. In the other case, with one extra work step, the translator would just need to translate the new step, and the previously translated strings would still nicely remain translated.

Pro tip: If you have such a list of numbered work steps, perhaps the number itself doesn’t need to be inside the string? HTML can number your steps automatically with an ordered list. Or you could use printf() and give the translator a separate string like “Step %s”.

Don’t use too small chunks that you “glue” together

If you programmatically concatenate several small strings to build a word or a sentence, then you’re up for surprises.

Of course, it is nice to add an “s” at the end of a noun to get the plural form! Table → Tables; Chair → Chairs; and so on.

But hey, this doesn’t work even in English! Man → Men; Industry → Industries!

So why would you expect this to be possible in a translated version of your code! So forget the idea that you could just have a “translation” for the plural “s” that you could glue to the end of various words. It would work only for Esperanto or for languages that don’t have any plural form!

And here is the single problem I most often contact developers about. The very same problem arises if you try to stitch together various parts of a sentence. Let’s say you come up with the brilliant idea of combining the string “New “ with different nouns. By the way, did you notice that space after the word “New”? Congratulations if you did, but most translators will miss it – not all translation platforms highlight such things, and the string to be translated is usually shown without surrounding quotation marks. But even more important: Such “spurious” spaces are a clear sign that you’re creating i18n-trouble.

Perhaps you try to keep the number of strings down at the cost of messing up i18n: If you want to combine the words “New” and “Delete” with “article”, “post”, “page”, “posts” to form several macro strings: “New article”; “Delete article”; “New post”; “Delete post”; etc. then you have just built a system that cannot be translated correctly to several languages. In Swedish, for instance, the adjective “New” needs to reflect the numerus and grammatical gender of the related noun:

English Swedish
New post Nytt inlägg
New page Ny sida
New pages Nya sidor

The solution is simple, although it may feel troublesome: You need to list all these possible combinations as separate strings.

Don’t break strings up in arbitrary ways

This is a special case of the previous section. I often see something like the following. This was in a plugin, inside a huge printf() statement:
esc_html__( 'To prevent abuse, many email services will ', 'text-domain' ),
esc_html__( 'not', ' text-domain' ),
esc_html__( 'let you send from a different email address.', ' text-domain' )

(I have edited and shortened the text slightly.)

When you break up a string like this, there’s no guarantee that the translator will see the strings together and in the right order. If the translator wants to create a normally sounding sentence in the target language, they may be forced to move some words or concepts between your various substrings.

For instance, in German, the main verb is usually located at the end of the sentence. The word “not” will in some cases not be translated as a separate word, and its translation may vary, depending on the context. In some cases, you may want or need to put emphasis on more than just the word “not”. Again: Spurious spaces in the beginning or the end of some strings is a clear sign that something is wrong.

In this case, a much better string to give the translator would have been to either include the HTML markup directly in the string (formatting and/or links):
“To prevent abuse, many email services will not let you send from a different email address.”
or perhaps You could “inject” the additional markup via placeholders:
“To prevent abuse, many email services will %1$snot%2$s let you send from a different email address.”
(in this case you should add a comment to the translators with a clarification on what the placeholders will do, of course!)

Context

How would you understand a string that contains the single letter “F”? It depends, right? Perhaps we’re talking about Fahrenheit vs. Celsius. Or this might be the shorthand for “Friday”? What if you need both meanings in your project? You can’t take for granted that Fahrenheit and Friday both will translate to the same letter. And if we’re talking about days of the week, then you’re certainly up for trouble with “T” for both Tuesday and Thursday; and “S” for Saturday and Sunday.
Another case could be the string “On”. Are we talking about “On the table”, or is this perhaps some parameter that should be translated as “On” or “Off”, depending on certain cases in the target language?
To you rescue in these cases comes _x(), which allows you to specify and limit the context of how this strings is going to be used.

Another situation where context is important, but where you perhaps won’t need to use _x(), is when your string contains some variables, like “%s on %s”. In this case you should do two things. If there are more than one placeholder, mark them with numbers to make it clearer for the translator how to change the order if needed (“%1$s on %2$s”). Secondly,you really should add a comment to the translator about what meaning and usage of the placeholders. It’s done like this:

// Translators: 1 author name, 2 Post title

Such comments, or in some cases _x(), may be needed more often than you expect. Will the string “Post” be used in your plugin in the head of a table where post titles are listed, or will it be shown on a button that lets a user publish something? If you need both these meanings, then you MUST use _x().

Also: don’t take for granted that the translator will see the context of a string. If they filter the translation view to only include untranslated strings, then neighboring strings may not be visible. Or if you fix a typo in your project, then the new string will be added to the top of the translation project, far away from its related strings. Therefore, make sure that each string is self-explanatory, or be generous with comments to translators. This is especially true if your project contains a “string catalog” that is just listing various possible strings for the UI, but may be “used” in some other place of the code (this happens a lot in Javascript projects).

Don’t remove variables from your strings

Some plugins and themes contain strings like “ago (or perhaps even “ ago”, where the string starts with a space) and try to add the variable programmatically, outside of the string itself. But this string can’t always be translated correctly, for instance to Swedish. The reason is that the English single word “ago” translates into the expression “för … sedan” where the elapsed time needs to be inserted in the middle. “1 day ago” → “För 1 dag sedan”. Use printf() here to make it possible to translate the string correctly.
In general, if you’re going to mix text and variables, allow the translator to choose where the variable content should be put inside the text.

Capitalization

Although it is possible to specify in CSS that the first letter of each word (or all letters in a string) should be capitalized, please don’t do it. The use of capital letters differs a lot between various languages. (And for most non-Latin scripts there is no such thing as uppercase and lowercase.)
By the way, the handling of capitalization is one of the key differences between en_US (US English) and en_GB (UK English).

Quotation marks and punctuation marks

The use of quotation marks and other punctuation differs between languages. You should always include any such characters in your strings, instead of printing the punctuation mark in the code.

If, for instance, you add an exclamation mark at the end of a string, then in Spanish there needs to be a corresponding inverted exclamation mark (¡) earlier in the string.
Or, if you add a colon at the end of a string, then in French there should be a non-breaking space before the colon.

Layout

When a sentence is translated, it will in most cases be a bit longer than the original string. Of course, in some cases the target string may also become much shorter than the source string. Make sure that your layout is flexible enough to allow for this. Pay special attention to highly specialized abbreviations or terms that may need more words in other languages. For instance, the popular expression “24/7” may need to be explained as “around the clock, every day”, which, obviously, becomes substantially longer.

RTL (right-to-left scripts)

Some languages use script systems that are written from the right to the left. You need to make sure that your layout and CSS handles this correctly. You also may need to translate a “right arrows” (→) into “left arrows” or include mirror images of various bullets, etc.

Ask someone who knows Arabic, Farsi, Hebrew, Urdu, etc. to check that your code works correctly.

But besides that, you don’t need to worry much. The logical order of content output is the same as in English, the software itself will make sure that the presentation happens from right to left.

Clarity of source strings; typos

The strings in your project are going to be read by hundreds of admins. Tens or hundreds of translators will do their best to understand your strings. And your strings going to be printed on websites maybe millions of times.

Use a little extra time to make sure that your strings are correct and easy to understand. Perhaps you can ask someone who knows English well to help you. This extra effort on your behalf will quickly multiply in saved time for users and translators.

Also, if the same string occurs several times in your code (with the same meaning and usage), if you write the string exactly in the same way – then it will be enough to translate that string only once. But it must be exactly the same string, with regards to capitalization, punctuation and even spaces.

Not only texts need to be translated

There are cases where it may be good to use the translation functionality for more than just translating strings. Here follows a few examples.

URL’s

it’s usually smart to make it possible to “translate” URL’s. If you’re linking to a Wikipedia article, for instance, then the translator would be able to link to the corresponding article in the target language. Or perhaps your own site is multilingual – then links to your site should get “translated”, too.

RTL

As I mentioned under “RTL”, you may need to enable translation of graphical elements. The natural sign for “Next article” would here be something that points left, for instance.

Fonts

Does your project use some special font? Is this font available for all possible target languages, like Russian, Urdu and Korean? If not, then it can be smart to use the translation system

Date/time/number formats

Various locales have differences in how they present date and time. Either reuse the configuration of the site or use the translation platform to allow the translator to configure suitable presentation. (If your project contains functions for data input, then you may need to think about how to let your order of input fields feel natural for everyone.)

Keyword lists

If you anywhere feel an urge to create a list of terms, perhaps some SEO keywords, or various words that need to be treated in a separate way due to grammatical reasons, etc, – remember that the structure of this list, and the number of terms needed, may be different for various languages. Instead of presenting these terms one at a time, it’s better to give all of them to the translator in one single string, with comma-separated terms. (And a very clear “translators” comment about the purpose and usage of the string.

Don’t include unnecessary things in your strings

It’s totally OK to include some markup in a string if you need to emphasize some words or add a link. If your strings include HTML tags, please make sure that both sides of a tag pair are included in the string.
But also avoid unnecessary tags. If a string will be used as a level 3 header, you don’t need to include the surrounding “H3” tags in the string itself.

Plurals – make sure you use _n() correctly

In English, you use singular for n=1, and plural in all other cases. But did you know that already French handles this slightly differently – they use singular also for n=0. And other languages have much more complicated structures for how different numbers steer what word forms you should use in your sentences. Luckily, you as a developer can just call _n(), and then WordPress will handle this for you.

But you need to remember that this is more complicated in many languages. Take Russian, for instance. For every pair of source strings with _n(), they’ll generate three target strings:

  • “Singular” will be used for numbers, where “one” is pronounced at the end: 1, 21, 31, 41, …, 101, etc.
  • “Dual” will be used for numbers where “two”, “three”, or “four” is pronounced at the end: 2, 3, 4, 22, 23, 24, 32, 33, 34, … 102, 103, and so on.
  • “Plural” will be used in all other cases: 0, 5–20, 25–30, etc.

Here you should also note that none of these forms is the same as the “generic, number-less plural”. Let’s look at the word, “table”, for instance:

English Russian
Table Стол (stol)
Tables Столы (stoly)
0 tables 0 столов (stolov)
1 table 1 стол (stol)
2 tables 2 стола (stola)
5 tables 5 столов (stolov)
21 tables 21 стол (stol)

(Remark: here we’re talking about the piece of furniture “table” and not data tables.)

Steered by the value of n, some languages have up to five or even six different translations for every string pair with _n()!
You as a developer needs to need to remember the following about _n().

Strings calling _n() should ALWAYS include the steering number in the string

You should always should use _n() together with printf(), and you will always need to mention the variable twice: both as the value for a placeholder in the print(), and as the steering value of n for _n().

You cannot use _n() as a shortcut for splitting between singular and plural

The function _n() is not meant to be used for selecting either singular or generic plural. You need to do that in your code instead, perhaps by doing something like this (pseudo code):
switch ($number_of_comments){
case 0:
“No comments”;
break;
case 1:
“There’s one comment here”;
break;
default:
_n(“There’s %d comment”, “There are %d comments”;
}

(Yes, in this case the singular string “There’s %d comment” will never be rendered in English.)

Being reachable

When volunteers start translating your code, you can make life much easier for them (and improve the quality of your code), but simply being available for communication.

  • Add a comment in your readme with information about how to find you.
  • Link to your Github project, you may get improvement suggestions, ready-to-merge.
  • Register in the Slack workspace for WordPress contributors in order to become easier to reach for questions from translators.
  • If you get stuck with i18n in your project, you’re welcome to join our Slack channel “polyglots” and ask your questions there. We’re a large, friendly, and truly global team, so there’s almost always someone around. And many “Polyglots” are also developers.

Feedback on this article

If you have suggestions on how to improve this article, please give me feedback. The “polyglots” channel on Slack would be great for this, but you can also try to contact me, “tobifjellner”, there directly.

Props:

Fellow members on the Polyglots team that have helped me fix mistakes: @presskopp

Hints about i18n – Let your code shine all over the world | Tor-Björn Fjellner on GSM and IT

Developer handbooks usually tell you how to call the built-in functions for translation. But my experience from translating hundreds of WordPress plugins and themes (and from mentoring the “Polyglots” community several years) shows that a developer needs to know a few more things about i18n that may not be obvious if you don’t know a few different languages yourself.

Type: Internationalization (i18n) improvements

Cuisine: WordPress plugins and themes

Keywords: i18n, translation, localization, optimization

Total Time: PT5H

Recipe Ingredients:

  • Use English (US)
  • Create reasonably sized text chunks
  • Don't concatenate/glue together small strings into sentences
  • Give context if needed
  • Include the variables
  • Capitalization
  • Punctuation
  • Layout and RTL
  • Cehck for typos
  • Not only text needs "translation"
  • Handle plurals correctly (many make mistakes here)
  • Be reachable