ICU plurals syntax in XTM Cloud

Introduction

This article provides details of how ICU plurals syntax can be processed in XTM Cloud, explaining what is possible, and what is not.


What is ICU plurals syntax?

While nouns only have two forms in the English language (example: one – month, other – months), other languages can have up to 6 forms (example, Polish: one – miesiąc, few – miesiące, many – miesięcy, other – miesiąca). Using ICU plurals syntax, you can encode multiple versions of a particular sentence, depending on the value of a number variable. This is especially useful in the localization industry.

Sample syntax:

{num, plural, zero {Selected {num} items} one {Selected {num} item} two {Selected {num} items} few {Selected {num} items} many {Selected {num} items} other {Selected {num} items}}

The first argument (num in this example) is the name of a variable that will be used to choose which version of the sentence should be selected. The application interpreting the syntax will select the appropriate version of the sentence according to the value of this variable.

The second argument (plural or select) indicates whether the variable should be interpreted using pluralization rules or based on specific keywords (for non-numeric values). In general, if the second argument is plural, the variable from the first argument has a numeric value. If the second argument is select, the variable will hold a text value.

Following the two arguments, there is a list of keywords: “zero”, “one”, “two”, “few”, “many”, “other” when using “plural”, or a custom set of keywords when using “select”.

{gender, select, female {She is here} male {He is here} other {They are here}}

The keyword “other” must always be present, as it is the one used by default.

Each keyword is followed by a sentence in curly brackets that corresponds to the plural form named by the keyword before it. These sentences can contain variables in curly brackets, but do not have to do so. You can use # as a number variable in “plural” (instead of writing the full variable name in curly brackets). If you do not want something (e.g. curly brackets) to be treated as a syntax element, you can escape it using an apostrophe.

You can find out which plural form keyword corresponds to particular numbers in a particular language here: Language Plural Rules.

This syntax does permit nesting but we recommend that nesting is limited if possible.

We also recommend that you always enter a full sentence inside plural syntax, instead of just the word that changes in your source language. This is because, in the target language, more words might need to change in the sentence, to translate it correctly.
You will find more details about these recommendations in an article concerning ICU plural messages: Formatting Messages.

When creating an ICU plural message, you can use the Online ICU Message Editor to check if it works as you expect.


How does XTM Cloud handle ICU plurals syntax?

To request a configuration that processes ICU plurals syntax efficiently, you need to create a ticket for the XTM Support team.

JSON files

In JSON files, we can activate our special ICU plurals parser, which interprets ICU plurals syntax and adds or removes plural forms for translation, depending on the target file.

The parser (usually a project-level filter template) needs to be selected before a file is uploaded (during project creation or before a new file is uploaded to the project), as the number of plural forms is adjusted during the initial analysis of a file.

The number of target plural forms is preconfigured in accordance with the documentation for cardinal plural rules for languages, described in Language Plural Rules. If necessary, the default plural forms can be changed for a particular language.

To help with translation, we can extract the plural (or select) form name as part of the segment ID, which improves matching and ensures that Linguists know which form they are translating.

Like any other JSON file, other metadata can also be extracted.

Currently, when a JSON file is processed with the ICU parser enabled, you should avoid the following:

  • Putting variables in a double curly bracket ({{name}}), even outside ICU syntax.

  • Using characters other than letters, numbers, underscores, and commas inside variables in curly brackets.

This is because the entire file is read with the ICU parser and these kinds of variable cause syntax errors during analysis.

XTM Cloud does not currently support numbered plural forms in which, for example, a plural version is chosen for a specific number. For example, =7 days can be called a “week” or =0 days can be called “no days” instead of “0 days”.
When a numbered form is used in the source file, it will not be returned to the target file.

Files that do not have JSON format

In files whose format is not JSON, no ICU syntax parser is available yet, in XTM Cloud. The XTM Support team can only configure custom variables to convert the syntax elements that these non-JSON files contain into inline tags.

Consequently, ICU plurals syntax is partially protected from modification, but the inline tags must be placed carefully to ensure that everything works as expected.

The target file will contain exactly the same plural forms as the source file and Linguists can only fix this by adding the missing forms manually (requiring them to have syntax knowledge about the target language). Furthermore, the way the file is segmented will not be ideal, as all the forms will often be left translated in one segment.