Translating PDF Source Files in XTM Cloud
Introduction
This short article explains why PDF source files sometimes do not return in the same format as the target file.
Explanation
Source files in PDF format are always returned as target files in DOCX format. This is because the formatting in a translation can differ from the formatting in its source text, because the target text lengths (for example) differ from the source document. Automatic conversion of the file format to DOCX makes it easy for you to adjust formatting to suit your needs. DOCX files can be converted back to PDF format easily after they have been downloaded from XTM Cloud.
For this reason, some of the formatting does get lost in the XTM Visual Mode since the file is first converted to a WORD document. It still provides context though, as the user can view if something is a ‘title’ or a ‘new section’, but XTM Visual Mode will not be an entirely exact copy of the target or the source. When it comes to target files, XTM Cloud provides files in the DOCX format, and if you save them as PDFs, then you can see that the formatting is correctly returned.
File could not be analysed: invalid file
Sometimes you encounter the issue that your PDF source file fails at the project analysis stage and the message appears that the file could not be analyzed.
The issue might stem from the fact that your PDF file might actually be are scans or images – XTM Cloud can not extract the text out of them. Unfortunately, we currently do not support OCR files.