Why are Some Segments Recognized as Non-Translatable?
Introduction
A translatable segment is a segment that contains at least one translatable element and therefore needs to be translated. Example: 12-17.01.2021 or 20-25.01.2021
. This segment is translatable because of the word or which has a translation in the target language. The primary reason is that segments like the one above include words, and there may be instances where you would like to translate them.
A non-translatable segment, on the other hand, is a segment in which all elements are non-translatable. Example: 12-17.01.2021: 20-25.01.2021
. A non-translatable element falls into one of the following categories:
Punctuation.
Number.
Link.
Currency.
Measurement unit.
XML.
Parameter.
Time.
Non-Translatable NLP Detector
As a basis, segments are divided into “tokens”. A token is a sub-string of a segment. A type and sub-type are assigned to each token. The mechanism for recognizing non-translatable segments has changed over time. Since XTM Cloud v.13.2, there have been improvements to the mechanism. (In v.13.3, a new standard enhancement was added. For v.13.2., configuration is needed). The new NLP non-translatable segments detector can recognize all non-translatable segments identified by the old detector. In addition, due to the added enhancements, the number of non-translatables recognized has increased by 30%.
The key areas of improvement include better handling of:
physical measurements.
markup segments (i.e. those containing HTML or XML tags).
URLs – more classes of correct URLs are recognized correctly.
The recognition of non-translatable segments is hard-coded by NLP algorithms and it is not possible to change the recognition configuration so that something is treated differently.