Post-editing in practice
Post-editing is the process by which humans review, edit, and improve the quality and usefulness of machine translation output. From light to full post-editing, the service – and price – varies largely according to the needs of the translation buyer. Here is an insight into the process and the types of services available in the marketplace.
Translation buyers can choose between two levels of post-edited machine translation (PEMT): light and full. These two types differ based on the human effort required to improve the usability of machine-translated text (Figure 1). So what do they involve?
Figure 1: The journey from raw machine translation to human quality
Light post-editing converts raw MT output into understandable and usable, but not linguistically or stylistically perfect, text. An editor corrects obvious errors such as mistranslations and terminology mistakes, along with phrases or tags that shouldn’t be translated. A reader can usually determine that the text was machine-translated and touched up by a human. Language Service Providers (LSPs) tell us that a translator doing light post-editing, also called “rapid post-editing,” can produce up to 20,000 words per day versus 2,700 without MT.
Full post-editing, on the other hand, is meant to produce human-quality output. The goal is to produce stylistically appropriate, linguistically correct output that is indistinguishable from what a good human translator can produce. This level of quality comes from the process that often involves the same number of QA checks as the traditional translate-edit-proof (TEP) process. LSPs assume that linguists can do 5,000 to 8,000 words per day of such heavy post-editing.
Creating content for post-editing
For the record, Common Sense Advisory has long characterized “post-editing” as a misnomer because it ignores “pre-editing” – that is, fixing the content before it is machine-translated to remove typos, jargon, and clumsy output. While humans are able to make such changes in a more comprehensive way, automated checkers from suppliers such as Acrolinx, SDL, and Tedopres can provide this function more efficiently. This prep work can extend to cleaning translation memories and terminology databases.
Where does the content that gets post-edited come from? The ideal, according to a recent Common Sense Advisory survey of translation buyers, is the output from commercial or open-source MT software that has been trained or customized for an organization’s terminology, style, or other linguistic requirements. These refinements are based on specialized rules, customized terminology, previous translation memories, or other linguistic assets.
However, the input to post-editing can also take the less ideal form of MT results from Google Translate or Microsoft Bing Translator, neither of which supports customization in their free variants. What’s the trade-off? Our surveys and interviews have found that more highly trained engines require less human intervention. Thus, post-editors will spend more time fixing the output of generic MT engines than cleaning up translations generated by trained software.
What quality to expect from service providers
With these two seemingly distinct types of post-editing on offer, you would expect that contracting with a translation vendor would be a simple act. It’s not. Because PEMT is a relatively new phenomenon with no standardized specification, ordering just what you want will take a little comparison shopping. At Common Sense Advisory, we reviewed the characteristics of light and full PEMT. We determined which type of PEMT addresses which translation error categories and specific issues. We based our determination on commonly used translation quality metrics such as the LISA QA Model and SAE J2450 (see Table 1).
What a particular language service provider (LSP) includes in its PEMT offering may differ, but the following table offers an idea of what you can expect. The checkmark symbol ✓ means that the listed issue is dealt with by a light or full PEMT process; the no-go mark ✗ indicates that it’s not included in that type of edit; and the asterisk * specifies that it may be included if failing to fix it would affect the meaning.
Table 1: Post-editing guidelines based on the LISA QA Model
Source: Common Sense Advisory, Inc.
Buyers and suppliers alike can expect this “no standards in place” model to change. Multiple companies, associations, universities, and individuals have worked on or proposed guidelines or standardized approaches to post-editing, quality assessment, and best practices. None have yet gained any traction in the marketplace. Our advice is to agree with your translation suppliers on exactly what light and full post-editing includes before contracting for a job.
Calculating the cost
Once you decide what it is you want, you’ll have to agree on what you should pay for it. Our research has found that the dominant pricing model for PEMT is word count multiplied by a percentage of the rate charged for a brand-new translation. Vendors charge between 40 to 85% of the full word rate for post-editing machine-translated European languages. That percentage reflects the range of rates for light and full PEMT across all sizes of suppliers (freelancers as well as small, medium, and large LSPs). Rates for Asian, African, and American language pairs differ.
Again, pricing isn’t that straightforward. We have also found that freelancers and small LSPs tend to show a greater preference for hourly rates than do mid-sized and large LSPs. What we’ve seen in this evolving market is that many LSPs refrain from quoting by the hour. That is because they don’t have enough data on the rate of productivity improvement or the required level of quality assurance for PEMT.
Some LSPs offer end-to-end managed services for PEMT. However, buyers can do some of the work themselves. For example, they can generate the output and send it to an LSP for post-editing. If that’s the case, the buyer will incur additional charges to cover software licenses, installation, training, development, integration, maintenance, processing, and other operational expenses.
Cloud-based, SaaS, and other remote MT solutions can lower or eliminate some of those costs. That approach will also incur fees for professional services. In this case, the MT supplier will charge for training the software or building dictionaries by language pair. However, if you use commercial MT products such as Asia Online or SDL BeGlobal, you’ll find do-it-yourself tools for training. Portals such as KantanMT, PangeaMT, or tauyou also let you do the training yourself.
Examples for light and full post-editing
Up to this point, this article has been theoretical. The big question is, for sure, what does MT look like in practice? To answer it, Common Sense Advisory asked four LSPs to provide examples of lightly and fully post-edited content. Three sent translations into English, while a fourth supplier sent an example of PEMT from English into Chinese. These are just some of the examples that we presented in our research on post-editing.
Each of the examples includes the original content (“Source”), machine translation output from a trained engine (“MT Output”), and both lightly and fully edited versions of that output (“Light PE” and “Full PE,” respectively). They also include: 1) comments on what changes were made from the MT to the post-edited variants, or from the light to the full versions; and 2) a measure of the amount of change between these variants.
The examples illustrate “edit distance” or “string similarity” metrics for quantifying post-editing changes. They measure the number of modifications – insertions, deletions, substitutions, and other changes – that the editor must make to transform MT output into fluent text in the target language without losing the meaning. Note that these metrics do not measure linguistic quality. The LSPs and their clients have separate mechanisms for that eternal challenge.
The Portuguese > English example uses SymEval (Johann Roturier) to generate scores for all segments, an overall project score, and visible differences. This example also quantifies the change between the light and full variant.
The German > English example uses the Jaro–Winkler model (Matthew A. Jaro and William E. Winkler) to gauge the similarity between two strings. In this example, the score is normalized so that 0% equates to no similarity and 100% is an exact match.
Both Chinese examples use the Levenshtein distance metric (V.I. Levenshtein), with a 10-point score instead of percentages for the different distances.