Accessing machine translation (MT) performance data/calculating costs based on EDC
Introduction
XTM Cloud comes with handy tools/options that help you to keep track of the changes made by a post-editor in a machine-translated target segment.
Before reading the article below, we recommend that you first familiarize yourself with the step-by-step guideline to using machine translation: How to enable and set up machine translation (MT). |
---|
Generating a report
A Preview Extended table report (see Preview Extended table – description for more information about the report) contains the MT performance information that you can generate in XTM Cloud.
Once the work in a project is finished, in XTM Cloud, select Project Editor → Files → Preview → (click on the “cog” icon) → Extended table…
When generating a report, ensure that you include the columns below:
Pre-translated text,
Post-edited text,
Final text,
Edit distance score.
Once the report has been generated, the columns will look like this:
Columns – description
Pre-translated text
The column shows a pure, unedited by a linguist, text coming directly from a machine translation engine. Of course, it does not necessarily have to be an MT match. It can be any other existing TM match that has been pre-populated in the target segment, if this is permitted by the project settings. The bold label at the top of the text indicates that an actual match has been applied.
Post-edited text
The column shows the linguist’s very first manual revision of a text that has been machine-translated or pre-populated from a TM.
Imagine a workflow that consists of three steps: Translate → Correct1 → Correct2 and the segment to be translated from English to Polish: Alice has a cat.
When the project is created, a pre-populated match is inserted in the segment. This segment contains the text: Ala ma chomika.
In the Translate step, the first linguist translates the segment to: Ala ma psa.
Then, in the Correct1 step, the second linguist corrects the previous linguist's translation, by entering another version of it: Ala ma rybki.
Finally, in the Correct2 step, another linguist changes the translation to: Ala ma kota.
As a result, the "Post-edited text" column will only display the translation that was inserted in the Translate step. In our example, this is:
Ala ma chomika.
Final text
The column shows a text which is a final translation version of a particular segment in XTM Workbench and so will be displayed in a target document. Note that the workflow step name in bold indicates the step in which this final version was inserted/the segment containing this version status was confirmed, so it does not necessarily have to be the last step in the workflow. A linguist might as well have begun work on the project in one of the previous steps and made the change there.
Edit distance score
This column contains a score which you can use to track the number of changes made by a post-editor in a machine-translated target segment. For more information, refer to the Edit distance calculation (EDC) section.
Edit distance calculation (EDC)
Definition
Edit distance calculation (EDC) is based on a metric that corresponds to the number of characters edited in a target machine-translated segment, divided by the total number of characters in that segment. The EDC feature compares all characters, white spaces and special characters in the MT output and final translation output.
Once the segment has been translated, the string from the MT match and the string from the actual translation are prepared for the CharacTER algorithm, and then all words (or characters, in the case of Asian languages) are put in alphabetical order and compared against one another using the algorithm. This calculation results in an EDC score, which ranges from 0 to 1. A score of 1 corresponds to a segment that required complete editing of the machine translation output. This would therefore be reflected by payment of the maximum possible amount.
EXAMPLE: For Chinese, if the entire segment consists of only two characters returned by an MT match, and you replace them with two different characters, the output will be completely different and equal to 1. |
IMPORTANT!
To obtain a correct value in the EDC score, you must ensure that the following criteria are satisfied:
The appropriate setting needs to be enabled in global workflow settings, as well as in the workflow settings of the project.
The EDC score is only calculated when the linguist edits and/or approves a segment that contains an MT match.
The segment must be approved.
If the MT match is approved and not edited, the EDC score will be 0 in the report.
The EDC feature does not work retroactively, meaning that the score will not be calculated for segments translated earlier (when EDC was disabled), if you decide to enable the feature at a later time.
If a segment is set as “Done” automatically, an edit to the target must be made to calculate an EDC score.
Examples
A segment has just an MT match that was automatically set as “Done”.
The linguist can change the status to “Not approved” and then approve it again – EDC will not be calculated.
A segment has an ICE match that was automatically set as “Done”, and an MT match.
The linguist can change the status to “Not approved” and then approve it again – EDC will not be calculated.
A segment has a not approved ICE match that was populated but not set as “Done” automatically, and an MT match.
The linguist approves the segment – EDC will be calculated.
How to enable it?
As a user with the Administrator role, go to Configuration → Settings → Workflow → Workflow options and activate Calculate edit distance. Do not forget to Save the changes.
In Workflow editor, for a specific project, activate the Calculate edit distance option in any CAT tool workflow step (one which can be opened up in XTM Workbench) in which you want to perform post-editing.
Obtaining EDC information via the API
Aside from a standard Excel sheet form, if you are using the API, you can easily obtain the most important information about a specific project, such as the:
Project name,
Customer name,
Language pair,
Segment ID,
EDC percentage.
Make use of the findProjectStatistics method in the REST API (see findProjectStatistics).
Calculating costs based on EDC
Use the EDC feature to calculate costs for post-edited segments of this kind. To enable costs calculation based on EDC, go to Configuration → Data → Estimates → Cost settings → Cost settings → Calculate cost based on edit distance.
Matrix
When EDC and costs based on EDC are enabled, projects costs will be reduced in accordance with the Edit Distance Score. The calculation of this score is based on a matrix located in the back-end, to which only the XTM International Support team has access.
See the default matrix for Edit Distance Score calculation immediately below:
Aggregated Edit Distance Score | Costs Reduction in % |
---|---|
0.0 - 0.024 | 60 |
0.025 - 0.049 | 56 |
0.05 - 0.074 | 53 |
0.075 - 0.999 | 49 |
0.1 - 0.124 | 45 |
0.125 - 0.149 | 42 |
0.15 - 0.174 | 38 |
0.175 - 0.199 | 35 |
0.2 - 0.224 | 31 |
0.225 - 0.249 | 27 |
0.25 - 0.274 | 24 |
0.275 - 0.299 | 20 |
0.3 - 0.324 | 19 |
0.325 - 0.349 | 18 |
0.35 - 0.374 | 16 |
0.375 - 0.399 | 15 |
0.4 - 0.424 | 14 |
0.425 - 0.449 | 13 |
0.45 - 0.474 | 12 |
0.475 - 0.499 | 11 |
0.5 - 0.524 | 9 |
0.525 - 0.549 | 8 |
0.55 - 0.574 | 7 |
0.575 - 0.599 | 6 |
0.6 - 0.624 | 5 |
0.625 - 0.649 | 4 |
0.65 - 0.674 | 2 |
0.675 - 0.699 | 1 |
0.7 - 1 | 0 |
If you would like to modify the matrix, please create a proper ticket to the XTM International Support team and provide request details.
IMPORTANT!
The EDC feature works only for the costs based on Statistics (either Statistics source or Statistics target).
Furthermore, costs which are based on Statistics for internal matches, i.e ICE, Leverage, Fuzzy, Repetitions, will not be reduced by EDC!
EXAMPLE:
|