Cross-file repetitions and TM matching
What are cross-file repetitions?
General information
Cross-file repetitions are just segments which are repetitions of the first instance of a particular segment (they have the same content). This segment is placed in the primary file (the file in which a particular segment occurs for the first time). In other words, if you have segments in other files that are repetitions of a particular segment from the primary file, those are called cross-file repetitions.
The analysis of a project takes place in a "top-to-bottom" fashion. When you create a project with multiple files that have the same segment, the first instance of that segment will be marked as Unmatched. Other instances of that segment in other files will be marked as a Repetition with this information: Repeat - matched across files:
If you see this information, it just means that this segment has not been translated yet in any file.
If any instance of that segment in any file is translated, then the relevant translation will be inserted in other occurrences if you access them in XTM Workbench, in the "Edit" mode:
IMPORTANT!
Please remember, that cross-file repetitions depend solely on records saved in TM! Therefore, for example, the use of a raw MT match which is not saved in TM will not cause cross-file repetitions to be populated with relevant translation.
Repetitions and cross-file repetitions are two separate functionalities.
The repetitions update is based on the user's settings in XTM Workbench.
The cross-file repetitions, however, are subject to the normal updating of matches, i.e. the current segment + 2 segments below will be updated.
If a particular cross-file repetition receives a TM match upon project analysis, its first instance will have an ICE/Leveraged status whereas the instances in subsequent files will be assigned Leveraged status and will be displayed in Project Metrics (external matches always take precedence over any kind of internal matching). This is caused by the fact, that the first instance was not marked as a repetition when the file was analyzed:
Although there cannot be a match from the same segment when it has been translated without reanalysis, it is expected that the first occurrence can receive a match from the other segment, which in this case is the child repetition.
Why are there so many repetitions in the Project Metrics?
In the vast majority of cases, the cause of this issue stems from the fact that two or more files containing similar segments are uploaded within the framework of a single XTM Cloud project. Then, one file gets repetitions, and the file that contains parent repetitions is deleted from the project. As a result, you are left with only one file with “ghost cross-file repetitions” pointing to a deleted file.
In general, when files are added/deleted after project creation, full project reanalysis is required to calculate cross-file repetitions properly.
Therefore, if you have only one file in an XTM Cloud project that contains some segments with information: Repeat - matched across files, the project most probably has not been reanalyzed since the deletion of some older files, which should be done soon after that.
Why are the Project Metrics showing translations as Repetitions and not as No Matching after the Linguist translated from scratch most of the repetitions in the file?
Such situation usually occurs when your project contains two files which are very close copies of one another:
Primary file (the file in which a particular segment occurs for the first time),
Copy file (the file that has the same segments as in the primary file)
The issue unveils when the Linguist translated this copy file first. The cross-file repetitions in the Matches panel in XTM Workbench refer to the same segment but in the primary file. Since the primary file was not translated, the repetition does not have any target text yet. Only when the primary file is translated properly, will the cross-file repetitions be populated.
Since the linguist translated the copy file first, those segments essentially had no TM matches, but Metrics still have to reflect the highest available TM match in the segment.
Nonetheless, you can use the Linguist’s Statistics for calculating the cost. In a situation where there is a segment with a Repetition that has no translation, the segment is treated as a No Matching segment in Statistics.
The "Hide repeated segments" option
In XTM Cloud, you can enable the option to hide repeated segments, and this also applies to cross-file repetitions (of course, except for the first instance of the segment). This setting can be enabled in Configuration → Settings → Translation → TM → Repeats:
Once enabled, the option is activated in the "create a new project" form when you can specify the percentage of a particular file at which repetitions occupy, at which the option shall be activated:
This feature in question hides repeated segments and populates the repetitions with the translation of the parent/original segment upon target generation, allowing for great consistency of translation.
Good to know!
Remember that, while repeated segments in XTM Workbench can be hidden, they are automatically populated with the newest translation in subsequent files so they will be translated accordingly in a target file once that file is generated again. Also, the Metrics will still display those repetitions, and the file concerned will show the original number of segments in the Project editor → Workflow section.