File processing: Frequently Asked Questions (FAQ)
- 1 Introduction
- 2 Questions and answers
- 2.1 What is hidden text in XTM Visual mode in general? When does it happen? What can be done to prevent it?
- 2.2 Why are some inline tags grouped? Why are there inline tags missing at the beginning or end of the segment?
- 2.3 Which file formats can XTM Cloud process?
- 2.4 What should I do if the target file fails to generate or does not look as expected?
- 2.5 What does the “not supported - without workflow” information mean in a project’s workflow?
- 2.6 Can XTM Cloud process plural forms/ICU plural syntax, etc.?
- 2.7 How do I use a filter template in a project?
- 2.8 Is there a difference between “parser” and “filter template”?
- 2.9 Can XLIFF source states be imported to XTM Workbench?
- 2.10 How do I change the segmentation in XTM Workbench?
- 2.11 Why is the same segment processed differently in two languages?
- 2.12 Why am I unable to merge segments in XTM Cloud?
- 2.13 Displaying source files with long names in XTM Cloud
- 2.14 What should I do if I want to configure custom variables, e.g. placeholders, in my translation files?
- 2.15 Why is some text from JSON not getting extracted for translation in XTM Cloud?
- 2.16 Why is my source file rendered as having “no content” after project analysis even though there is some translatable content in the file?
- 2.17 Is it possible to exclude highlighted parts of the text in MS Word files from translation in XTM Cloud?
- 2.18 Why am I seeing certain text in XTM Workbench twice, from MS Word source files?
- 2.19 How does XTM process newline tags? What does it look like in XTM Workbench and can I decide where to put them in the target text?
- 2.20 Why are some elements/fields/tags that exist in my source file absent from the target file?
- 2.21 Why are there so many green inline tags displayed in the source segments for a particular file in XTM Workbench?
- 2.22 Why do my source files fail at analysis although the format is supported in XTM Cloud?
- 2.23 Why is some part of the source text from a MS Word file not being extracted for translation in XTM Workbench?
Introduction
The purpose of this article is to answer common questions about the File processing module in XTM Cloud and to help resolve issues you might experience with it.
Questions and answers
What is hidden text in XTM Visual mode in general? When does it happen? What can be done to prevent it?
Hidden text in XTM Visual mode consists of elements such as image alternative texts (alt texts) associated with visual items. These texts can be displayed or hidden according to the user's preference. To show or hide hidden text in Visual mode, open XTM Workbench for your project, go to the Visual mode top bar menu, and select the relevant option as needed: Show hidden text or Hide hidden text.
When it comes to MS Word documents in particular, the hidden text in is usually related to the metadata added during document creation. When one creates a Word document and starts working on it, the document starts creating the said metadata. This can include elements like comments, timestamps, tracked changes, revision marks, etc. Such metadata information is usually hidden within the file, however it may happen that it gets extracted to XTM Workbench. Fortunately, it is possible to exclude that content via the ITS rules by dint of the after-analysis file available at the backend – this file contains all the content extracted to XTM Workbench.
The recommended action is to raise a JSM ticket to the XTM International Support team with a description and examples of content that needs to be excluded. |
One example of such content is presented on the screenshot below:
On the back-end side, in an after-analysis file, it looks as follows:
<w:lvlText w:val="PART%1"
xtm-id="4">PART<w:var value="%1"/>
</w:lvlText>
<w:lvlText w:val="PART %1"
xtm-id="5">PART <w:var value="%1"/>
</w:lvlText>
<w:lvlText w:val="PART %3."
xtm-id="6">PART <w:var value="%3"/>.</w:lvlText>
In this concrete example, this element specifies the textual content that should be displayed when displaying a paragraph with a particular numbering level. It can be excluded via the ITS configuration.
Why are some inline tags grouped? Why are there inline tags missing at the beginning or end of the segment?
Inline tag grouping in XTM Cloud is a feature designed to minimize the number of inline tags displayed in a segment, so Linguists can concentrate on translation without worrying about tag placement. This feature is enabled by default and can be disabled globally, per file extension, without affecting TM matching, and does not require reanalysis.
IMPORTANT!
Currently, there is no option to enable/disable grouping tags at customer and/or project level. It can only be done at global level.
Inline tags in XTM are grouped together by default if they are bundled together. If during an initial analysis a TM match that has inline tags is populated to the target segment and the tags position themselves in a way that does not satisfy the grouping mechanism, they will be ungrouped. However, once the tags are moved next to each other again in the XTM Workbench session, the criteria for grouping tags are then satisfied and so redundant tags are removed.
IMPORTANT!
Keep in mind that it is not possible to discriminate between types of inline tag that are grouped when using this option. All inline tags will always be grouped regardless of their type. For example:
span tag (e.g.
span class="uitext"
>),variable tag (e.g.
<span class="uitext"><MadCap:variable name="flvar-product.SC" />
).
Similarly, the hiding of leading and trailing inline tags is another feature intended to improve the translation experience. It can also be disabled globally, per file extension. Both features deactivate automatically if the grouped or hidden inline tags inserted in a translation or match are in a different order from the source file. However, it is important to note that leading and trailing inline tags cannot be hidden in segments with internal matches (Repetitions and Fuzzy Repetitions).
Which file formats can XTM Cloud process?
Visit the following XTM Cloud help page, to see what file formats are supported, for translation in XTM Cloud: Language codes (select your XTM version and search for the phrase: Language codes).
What should I do if the target file fails to generate or does not look as expected?
If the target file fails to generate or does not look as expected, this could be for several reasons. One common issue is the misplacement or lack of inline tags, which can cause target file generation to fail. To troubleshoot this, you can check if XTM Cloud is able to provide a reason for the failure by selecting Project Editor → Files → (select the relevant target language) → Target → (click on the red exclamation mark icon). If the message does not tell you what is wrong with the file, you can create a simple TEST project that is exactly the same as your original one and try to generate the target file there. If the target file still has the same problem, you should also check for any misplaced inline tags and try again.
Another issue could be that the target has been generated but will not open due to misplaced inline tags. To search for misplaced inline tags, select Project editor → General info → General info, find the QA profile and edit it to deselect all checkboxes except for the Order of inline tags is changed or content is missing between inline tags in target segment checkbox in the Other section. Then, run the QA for the document, setting it to only list segments with warnings and review the segments to find any inline tags whose order has been changed incorrectly.
If target generation runs but never ends, this is likely to be due to stuck queues. In this case, you should raise a ticket with the XTM International Support team, on our Support Portal and provide the necessary information.
Lastly, if the target does not generate at all, the problem could be caused by headers. This issue is associated with the Initialization error in XTM Workbench. The solutions to this problem include reanalysis of the whole project or manual repair of the issue. More about it in this article: XTM Workbench – most common issues & troubleshooting.
What does the “not supported - without workflow” information mean in a project’s workflow?
This message means that the file intended for analysis is not supported by XTM Cloud. All the supported files formats are listed in Which file formats can XTM Cloud process?.
Can XTM Cloud process plural forms/ICU plural syntax, etc.?
Yes, XTM Cloud can localize plurals in several file formats including Android XML, DITA files, JSON with ICU syntax, PO/POT and stringdict. However, by default, plural localization is disabled. Plurals are grouped in keyword categories for naming forms: zero, one, two, few, many, and other. The other form is always required as it is used for something that does not match any other category. To find out more about processing plural forms/ICU plural syntax by XTM Cloud, read the following articles:
How do I use a filter template in a project?
To use a filter template in a project, follow these steps:
Open the project in the Project Editor.
Select the General info tab.
In the General info section, select the required filter template from the Filter template dropdown.
Select Save to close the Project Editor.
In the Project list, find the required project, click on its context’s menu, and select Actions → Reanalyze project.
If the project has already been created, you can choose a filter template from the same dropdown in the project’s General info section. Remember to Save the change at the bottom of the page.
If you do not want to reanalyze the whole project, you can always upload a new file to it via the Files tab screen. This file can be added to the ones that are already present or replace one of them. In this case, only newly uploaded files are analyzed with new configuration rules. If you upload a file with the same name and extension as one of the source files, it will overwrite the content of the previous file during analysis.
The new or updated configuration is applied during the analysis step. If any changes are made to the configuration, full reanalysis of the project that was using this configuration must be performed to apply the changes to all the target languages/files.
Is there a difference between “parser” and “filter template”?
“Parser” and “filter template” are both broad terms for the file processing configuration, but in XTM Cloud, we use the term "filter template". You can either choose a project-level filter template (usually just called a filter template) or use a global configuration or a customer-specific configuration.
Can XLIFF source states be imported to XTM Workbench?
For XLIFF 1.2 files, the State attribute of a target element in a trans-unit is imported as a segment state in XTM Workbench. By default, signed-off is imported as completed, but we can also prepare a different mapping configuration if other XLIFF 1.2 states (as defined in the XLIFF standard) are preferred to be set as an XTM Workbench segment state. For more details, see How XLIFF/XLF Files are Handled as Source Files in XTM Cloud.
What is more, when uploading offline XLIFF files to a project, the relevant segments will be locked in XTM Workbench if any of the following attributes is used in the trans-unit node in the XLIFF file: xtm:locked="yes"
, xtm:locked="true", xtm:locked="y"
.
How do I change the segmentation in XTM Workbench?
If segments in XTM Workbench are separated with a dotted line, you might be able to merge them into one segment (by for example right-clicking on them), directly in XTM Workbench. Other changes will require reanalysis to be performed again. XTM International Support can change structure-based segmentation for any level or it can be changed at customer level, or globally, in the Analysis manager in the XTM Cloud UI. Sentence-based segmentation can be changed at any level in the Filter templates feature in the XTM Cloud UI, or by XTM International Support if you are unsure about regular expressions or not familiar enough with using them.
Why is the same segment processed differently in two languages?
The segment can be processed differently in different languages if it is a file with plurals (formats: Android XML, JSON with ICU syntax, PO, stringsdict (for more details, see How to configure plurals in XTM Cloud)). It is also possible that a source segment is the same but the inline tags are not grouped in one of the languages due to translation from a TM match.
Also, there might be different segmentation rules in different languages (e.g. one language is being segmented after a certain character, while the other is not).
Why am I unable to merge segments in XTM Cloud?
This might be for several reasons. One reason could be that the segments did not originate from the same paragraph in the source document. In XTM Workbench, segments from the same paragraph are separated by a dotted line, and only these can be merged. Segments from different paragraphs, separated by a solid line, cannot be merged. You can find more information about this in the following article: Merging/unmerging segments in XTM Workbench.
Another reason could be that one of the segments is in read-only mode, or one of them is a file repetition. These segments cannot be merged.
Last but not least, you might also mistakenly be trying to merge segments in the Review workflow step that does not allow for such actions (for more information see: What you can and cannot do in the Review step).
Displaying source files with long names in XTM Cloud
If you notice an issue while a source file with a very long name is being uploaded, the file name will be shortened automatically.
That is because the file name becomes incorrect when the path is too long. This issue is caused by operating system limitations and not XTM Cloud. The workaround is to insert the file in a ZIP file and then upload it.
What should I do if I want to configure custom variables, e.g. placeholders, in my translation files?
In the XTM Cloud UI, you can create a filter template that handles various custom variables, in the Filter templates section.
See the article that describes the process of creating a filter template via the UI: How to create a filter template in the XTM UI and apply it to a project.
Also, make sure to visit the official XTM Group documentation, for even better comprehension of the subject: Configuring custom variables in a new template at Project level.
What is more, you create a regular expression (REGEX) pattern that defines the variable text you wish to catch. Some common patterns are already suggested during the creation process, but you can create more complex ones yourself or use some online help. Feel free to explore various external tools that will help you build, test, and debug REGEX patterns, such as: https://regex101.com/.
If you are sufficiently familiar with REGEX patterns yourself, you can also make use of the ChatGPT AI tool, to help you with constructing appropriate patterns.
Should you need further help with setting up relevant custom variables, or with creating entire new filter templates, do not hesitate to issue a JSM ticket to the XTM International Support team (see the following article for more information: How to request a new configuration or change to an existing one).
IMPORTANT!
Keep in mind though, that not all custom variables can be converted into inline tags for certain file formats; for example: the <byte>
tag for the .ts file format.
This is because the XTM custom variable processor works on TEXT nodes, not on TAGS. Therefore, the whole tag can not be hidden in a custom variable. If the tag had any text content within, then such content could be hidden, but not the whole tag.
Hiding tags can as well be set with ITS rules, but the .ts file format is a bilingual one and can not be processed with ITS configuration.
Why is some text from JSON not getting extracted for translation in XTM Cloud?
There might be a number of reasons why some text from the JSON file is not getting extracted for translation in XTM Cloud. One of them might be specific settings of your file filter template that accounts for parsing JSON files.
For example take a look at the following two JSON lines:
"supposeThatHttpdHasTwoHashValues": "JA_Suppose that httpd has two hash values (h1 and h2) across 1000 servers in the farm: h1 in 1 server, h2 in the rest 999 servers. In this case, minority_score(h1) = 0.999, minority_score(h2) = 0.001. Then score(h1) = -log2(0.999) * 98 + 1 = 1.14. Since minority_score(h2) < 0.5, h2 is not considered an anomaly, hence score(h2) = 100.",
"supposeThatHttpdHasTwoHashValuesH1AndH2": "Suppose that httpd has two hash values (h1 and h2) across 10 servers in the farm: h1 in 1 server, h2 in the rest 9 servers. In this case, minority_score(h1) = 0.9, minority_score(h2) = 0.1. Then score(h1) = -log2(0.9) * 98 + 1 = 15.90. Since minority_score(h2) < 0.5, h2 is not considered an anomaly, hence score(h2) = 100.",
Most of the value of the “supposeThatHttpdHasTwoHashValues"
parameter is contained in between underscore characters, whereas the value of the "supposeThatHttpdHasTwoHashValuesH1AndH2"
parameter is not. You might have a special configuration applied in your filter template which converts anything between underscore characters into an inline tag. You can then see them in the { } Inline tags section, in XTM Workbench.
Considering the above, make sure to first review the configuration of your filter template and do not hesitate to contact the XTM Support team in case of any doubts/questions.
Why is my source file rendered as having “no content” after project analysis even though there is some translatable content in the file?
In the vast majority of cases, the lack of content extracted from the source file might stem from an incorrect Filter Template that has been applied to the project. For this reason, make sure to first review the Filter Template used in this project. If you are not sure of the Filter Template’s settings, do not hesitate to contact the XTM Support team for help.
Is it possible to exclude highlighted parts of the text in MS Word files from translation in XTM Cloud?
Naturally, it is possible to exclude highlighted parts of the text in MS Word files from translation in XTM Cloud. Contact the XTM International Support team to have it configured for you on the back-end side of your XTM instance. In a JSM request, provide a sample source file with highlighted text, which shall be excluded from translation.
Why am I seeing certain text in XTM Workbench twice, from MS Word source files?
When translating in XTM Workbench, you might wonder why certain text is displayed in the editor twice. This is because of the existence of the so-called alternate content which is by default taken for translation by XTM (unless configured otherwise on your instance by the XTM International Support team). Such content is often a string of text that is contained within charts' or images' descriptions or within text boxes.
Take a look at the following screenshot which presents two segments, the first of which (70) is actual content and the second one (82) is its alternate content.
For more information about the alternate content, see: How are segments in MS Office files (Word, Excel, PowerPoint) extracted?.
How does XTM process newline tags? What does it look like in XTM Workbench and can I decide where to put them in the target text?
When your source file contains a newline tag (\n), such as in the Today is a \n sunny day.
, in the XTM Workbench, this text will appear as: today is a sunny day
. Here, the \n tag is displayed as a whitespace character.
By default, XTM Cloud does not break segments on newline tags. The entire text remains in one segment, with the newline tag rendered as a white space.
However, we can configure XTM Cloud to handle newline tags in other ways, based on your preference:
Break Segments on \n:
If configured, the \n will cause the text to be split into separate segments in XTM Workbench. For example:
Segment 1:
Today is a
Segment 2:
sunny day.
Display \n as an Inline Tag:
We can make the \n visible as an inline tag rather than displaying it as a whitespace character. This allows the linguist to see the newline explicitly and decide where to place it in the target segment, while translating the text in XTM Workbench.
Why are some elements/fields/tags that exist in my source file absent from the target file?
XTM Cloud should retain all elements/fields/tags that exist in the source file, regardless of whether or not they are eligible for translation in XTM Workbench. If you state that some nodes from your source file are missing in the target file that you generated in XTM Cloud, it means that there are two possible scenarios:
The data in question is absent from your source file in the first place, in which case you should double-check if this is truly the case. It might happen that clients overlook that fact when preparing source files for translation.
There might be an issue on the XTM Cloud side that would need fixing. In such a case, once you have made sure that your source file does contain the nodes in question, do not hesitate to contact the XTM International Support team and describe the problem.
Why are there so many green inline tags displayed in the source segments for a particular file in XTM Workbench?
Sometimes you might experience a great number of green inline tags displayed in the source segments for a particular file in XTM Workbench. In the vast majority of cases, when you go to the { } Inline tags section of the docked panel, what you might observe is that those green inline tags are related to the so-called Spacing.
On the XTM Cloud side, there is not much that can be done with the said inline tags. One of the core principals of file processing in XTM Cloud is to maintain file structure consistency between the source and target files. Therefore, the tags in the source file must also be included in the target file.
The source of these inline tags are differences in spacing. A lot of different spacing mostly occurs when source files are converted from some other extensions, such as PDF.
Therefore, the best solution to reduce the number of tags is preprocess files and standardize spacing.
Why do my source files fail at analysis although the format is supported in XTM Cloud?
Sometimes, you might experience that source files, whose format is fully supported by XTM Cloud, are not being analyzed. In the vast majority of cases, the issue stems from a faulty filter template that you might have applied for a project. To validate that, try creating a new project with the source files in question and without applying the said filter template, to see if the files will be analyzed successfully.
In both cases, when the filter template turns out to be the cause of the issue and when it does not, do not hesitate to contact the XTM International Support team and provide details. The team will investigate the filter template’s configuration.
Why is some part of the source text from a MS Word file not being extracted for translation in XTM Workbench?
For MS Word files, there might be several reasons why certain part of the text is not being extracted in XTM Workbench although the text in question is clearly visible in the source file. In the vast majority of cases, the issue is caused by application of certain styles that prevent the content from being sent for translation in XTM Cloud.
For example, the affected text might be marked with the Hidden option used in its chosen style.
Since the font is marked as Hidden, the text is excluded from translation by default.
If these rules need to be adjusted, you should consult it with your administrators and create an official JSM request to prepare a relevant configuration that will allow this text to be extracted for translation in XTM Cloud.
Alternatively, you can modify this style to remove the Hidden option effect, or change the style applied to this text.
To learn more about what is and what is not extracted for translation in MS Word files, make sure to read the following section of the dedicated article: How are segments in MS Office files (Word, Excel, PowerPoint) extracted? -> Word – .doc, .docx, .rtf.