Why are Some Segments Recognized as Non-Translatable?

Introduction

A translatable segment is a segment that contains at least one translatable element and therefore needs to be translated. Example: 12-17.01.2021 or 20-25.01.2021. This segment is translatable because of the word or which has a translation in the target language. The primary reason is that segments like the one above include words, and there may be instances where you would like to translate them.

A non-translatable segment, on the other hand, is a segment in which all elements are non-translatable. Example: 12-17.01.2021: 20-25.01.2021. A non-translatable element falls into one of the following categories:

  • Punctuation.

  • Number.

  • Link.

  • Currency.

  • Measurement unit.

  • XML.

  • Parameter.

  • Time.


Non-Translatable NLP Detector

As a basis, segments are divided into “tokens”. A token is a sub-string of a segment. A type and sub-type are assigned to each token. The mechanism for recognizing non-translatable segments has changed over time. Since XTM Cloud v.13.2, there have been improvements to the mechanism. (In v.13.3, a new standard enhancement was added. For v.13.2., configuration is needed). The new NLP non-translatable segments detector can recognize all non-translatable segments identified by the old detector. In addition, due to the added enhancements, the number of non-translatables recognized has increased by 30%.

The key areas of improvement include better handling of:

  • physical measurements.

  • markup segments (i.e. those containing HTML or XML tags).

  • URLs – more classes of correct URLs are recognized correctly.

The recognition of non-translatable segments is hard-coded by NLP algorithms and it is not possible to change the recognition configuration so that something is treated differently.