Reindexing, translation memory (TM) & terminology
What are Lucene indexes and how does a reindex action work?
Lucene indexes
The work of XTM Cloud translation memory and XTM Cloud terminology is based on the so-called Lucene indexes. These are special files which are created in the back-end, in the following sequence: for each customer → each language used for this customer.
Reindex action
The reindexing process involves recreating translation memory/terminology Lucene indexes and is used once there are problems with TM matching in projects and a particular term is not highlighted in XTM Workbench.
The reindex action is not an update action, as there will not be any change within XTM Cloud or a downtime: only the Lucene index will be recreated.
IMPORTANT!
Reindexing of translation memory and terminology are two separate operations!
To check if the reindexing action has successfully resolved the issue, you can, for instance, check if the same TM entry is displayed in XTM Workbench and in TM Manager (or, in the case of reindexing terminology, see if a particular term is properly highlighted in blue, in XTM Workbench). No additional checks are needed after the process has finished. Once the action has been performed in the reported projects, and an update has been performed within segments, all correct matches should be available, as they will be returned correctly from the TM database.
Constraints
The reindex itself is performed per client and per language, meaning that the example Portuguese (Brazil) language (pt_BR) will have its indexes recreated for all customers.
IMPORTANT!
The process for reindexing all languages (as opposed to reindexing per language) can be only scheduled during non-business hours, as it might affect users working in XTM Workbench. XTM Cloud will still be accessible but the saving of segments or matching within the language that is being reindexed at that time will be suspended for the duration of the process.
The total time the process takes depends on how much TM data must be reindexed. It can take from a couple of minutes to several hours. After the process has finished, the support agent checks the logs for confirmation that the action has been performed correctly.
Terminology reindexing is performed per client, for all language combinations at once, not separately.
How to recognize translation memory/terminology indexing issues
How to recognize translation memory (TM) indexing issues
The most common symptoms of TM indexing issues are:
A particular TM match is shown in XTM Workbench, but you cannot find it in the TM Manager.
A phrase search is not returning as many matches in the TM Manager as it should. If XTM International Support staff search for every phrase from the project/all phrases, or phrases by record ID, they will provide you with a list of records, but not when they look for a specific phrase.
Concordance is not displaying matches or is displaying too few matches.
Reindexing action should also be performed after large-scale operations on the translation memory in the database (for instance, changing the status of multiple TM records from Not approved to Approved). But this is something we should already be well aware of. However, if batch approval is performed via the XTM Cloud UI then no reindexing is necessary.
XTM Workbench slow performance issue
If XTM Workbench is performing exceptionally slowly, some translations inserted into segments might not be saved, which usually results in a warning message (see the screenshot below).
This situation most probably occurs because there are too many Lucene indexes files present on the server, which, in turn, might be due to the large number of XTM Cloud customers. For example, imagine there are 10 customers, and we have 10 languages for each of them, so there are 100 directories with Lucene indexes files, each containing 2 files. This already results in 200 files for just 10 customers. If there are even more customers with multiple language combinations, at a later time, the number of Lucene directories will increase proportionately. This might cause performance issues because XTM Cloud is having difficulties in accessing files in those directories due to their huge number.
The most effective solution to this is to just remove some of the indexes of those XTM Cloud customers who are either not being used or are inactive (this action, however, needs to be performed by the XTM International Support team because deleting the TM for a particular customer and/or deleting this customer from the XTM Cloud UI will not remove Lucene indexes files in the back-end, as the files are stored even for the deleted customers). These indexes can also be safely restored by the reindexing process so and do their customers do not need to be deleted in the UI.
How to recognize terminology indexing issues
Actually, there is only one XTM Cloud UI "symptom" that would indicate a need to reindex the terminology. If a particular term is visible in the terminology database in the UI for a particular language combination, make sure that you have included it in your current project (by creating the project for the customer that holds this terminology data or by incorporating the relevant resources from another customer in the project). If you now enter XTM Workbench in "Edit" mode, activate a segment and see that the term in question will not highlight blue, this might be a valid reason to perform reindexing.
Also check if the Disable term decoration option is activated globally in Configuration → Settings → Translation → Terminology → Terminology options:
You have it switched off during project creation:
What does the XTM International Support team need to start analyzing an issue involving TM indexing?
Check which essential information you need to provide to the XTM International Support team in the case of TM indexing issues:
the phrase you were searching for in the TM Manager/concordance,
the username or ID of the user who was performing the search,
language combination;
the customer for which the search was performed (if in concordance).
IMPORTANT!
Schedule reindexing during non-working hours as it is not recommended that projects are translated or created during working hours. As a reference point, reindexing 1.5 million segments takes around 5 minutes on average. It is advised to run reindexing over all languages straight away. It might even take up to 6 hours.