What are Lucene indexes and how does a reindex action work?
Lucene indexes
The work of XTM translation memory and XTM terminology is based on the so-called Lucene indexes. These are special files which are created on the back-end side in the following order: for each customer → each language used under this customer.
Reindex action
The reindexing process consists in recreating translation memory / terminology Lucene indexes and is used once there are problems with TM matching in projects and a given term is not highlighted in Workbench respectively.
Reindex action is not an update action, as there will not be any change within XTM or a downtime, only the Lucene index will be recreated.
IMPORTANT!
Reindex of translation memory and terminology are two separate operations!
In order to make sure if the reindex action successfully addressed the issue, you can, for instance, check if the same TM entry is displayed in Workbench and in TM Manager (or, in the case of reindexing terminology, see if a given term is properly highlighted blue in Workbench). No additional tests are needed after the process is finished. Once the action has been performed in the reported projects, after performing an update within segments, all correct matches should be available as they will be returned correctly from the TM database.
Constraints
Reindex itself is performed per client and per language, meaning that the exemplary Portuguese (Brazil) language (pt_BR) will have its indexes recreated under all customers.
IMPORTANT!
The reindexing process of all languages (as opposed to reindexing per language) can be scheduled only during non-business hours, as it may affect users’ work in Workbench. In general, XTM will still be accessible, however, saving segments or matching within the language that is being reindexed at that time will be suspended for the duration of the process.
The overall time the process takes depends on the size of TM data that has to be reindexed. It can take from a couple of minutes to several hours. After the process, the support agent checks the logs for confirmation if the action has been performed correctly.
When it comes to reindexing terminology, the action is performed per client on all language combinations at once - there is no division here.
How to recognize translation memory / terminology indexing issues?
How to recognize translation memory (TM) indexing issues?
The most common symptoms TM indexing issues are as follows:
A given TM match is shown in Workbench, but you cannot find it in TM Manager.
A phrase search is not returning as many matches in TM Manager as it should. When searching for every phrase from the project/all or through the record ID, XTM will provide you with a list of records, but not when looking for a specific phrase.
Concordance is not displaying matches or is displaying too few matches.
The reindexing action should also be performed after large-scale operations on the translation memory on the database (for instance, changing the status of multiple TM records from Not approved to Approved). But this is something we should already be well aware of. However, if the batch approval is done through XTM UI, then no reindex is necessary.
Workbench slow performance issue
If Workbench is preforming exceptionally slowly, some translations inserted into segments might not be saved, which usually results in the warning message (see the screenshot below).
This situation is in all likelihood caused by the excessive number of Lucene indexes files present on the server, which, in turn, might be due to the large number of XTM customers. For example, let’s say there are 10 customers, and for each of them we have got 10 languages, which now means 100 directories with Lucene indexes files, 2 files each. This results in 200 files for just 10 customers already. In the case of even more customers with multiple language combinations, the number of Lucene directories will increase proportionately. This might cause performance issues because XTM is having difficulties in accessing files in those directories due to their huge number.
The most effective solution to this is to just remove some of the indexes of those XTM customers who are either not being used or are inactive (this action, however, needs to be performed by the XTM Support team because deleting the TM for a given customer and/or deleting this customer from XTM UI will not remove Lucene indexes files on the back-end side, as the files are stored even for the deleted customers). These indexes can be further safely restored by the reindexing process and do not require deleting their respective customers in UI.
How to recognize terminology indexing issues?
Actually, there is only one XTM UI “symptom” that would indicate the need for a terminology reindex. If you have got a certain term visible in the terminology database in UI for a given language combination, make sure that you have included it in your current project (by creating the project under the customer which holds this terminology data or incorporating the relevant resources from another customer to the project). If you now enter Workbench in the “Edit” mode, activate a segment and see that the term in question will not highlight blue, this might constitute a valid reason to perform a reindexing action.
Furthermore, if you happen to have the Disable term decoration option activated globally in Configuration → Settings → Translation → Terminology → Terminology options:
You have it switched off during the project creation:
What does the XTM Support team need, to start analyzing the issue with TM indexing?
Please, see the essentials that need to be provided to the XTM Support team in case of TM indexing issues:
phrase you were searching in TM Manager/ concordance;
username or ID of a user who was conducting the search;
language combination;
customer for which the search was done (if in concordance).
IMPORTANT!
Please, schedule reindexing during non-working hours, as it is not recommended to translate or create projects during working hours. As a reference point, the average reindexing of 1.5 million segments takes around 5 minutes. Because it is advised to run reindexing over all languages straight away, it might even take up to 6 hours.