September 2017
By Arle Lommel

Image: © ktsdesign/

Arle Lommel is a senior analyst with independent market research firm Common Sense Advisory (CSA Research). He is a recognized expert in quality processes and interoperability standards. Arle’s research focuses on technology, quality assessment, and interoperability.

Twitter: @ArleLommel

Neural machine translation offers significant advances with remaining challenges

No shift in machine translation technology has progressed quite as rapidly as the latest hype: neural machine translation. But is it really as promising as the reports make out?

The standard joke about machine translation is that perfect MT is just five years away, and has been for fifty years. Just how true that statement is has become apparent in the last year as claims about MT progress have undergone one of their periodic bouts of hyper-optimism. In this case, the cause is neural machine translation (NMT), a technique that uses computer neural networks – an artificial intelligence approach that is designed to mimic the function of neurons in brains – to translate text from one language to another.

An examination of the technology by Common Sense Advisory (CSA Research) shows that it does represent a significant improvement over the previous state-of-the-art phrase-based statistical machine translation (PbSMT) systems that continue to dominate the industry, but some developers and tech reporters oversell the technology. To understand why NTM is important – and what it can and cannot do – consider the following four points.

  1. NMT does not learn languages in the same way humans do, even though some breathless reporting has made this claim. Instead, it relies on statistical correlations, much like PbSMT does. The difference is that NMT can make much more complicated inferences from that data and is very good at determining correlations of correlations. Where a PbSMT system might observe that English frog tends to translate into German as Frosch, a neural system could note that if the text mentions words to do with railways, Herzstück is a much more likely translation, even if this translation occurs only a few times in a training corpus.
  2. Older systems look at n-grams, strings of a certain number of words. For example, if a system works with 6-grams, it considers chunks of up to six words. This approach works fine for linguistic structures that are compact, but has trouble with "long-distance dependencies," such as German verb phrases, which may contain entire clauses in between parts of a verb phrase. By contrast, NMT systems look at whole sentences in their entirety, and researchers are now pushing them to work on entire paragraphs or even longer chunks of text. This shift allows them to be more sensitive to context and handle complex grammatical structures more effectively.
  3. NMT looks at individual characters, while phrase-based approaches look at words. This difference makes neural systems particularly good at working with morphologically rich languages, such as German or Hungarian. For example, a PbSMT system would not – without additional language technology – recognize that both speichern and gespeichert are forms of the same verb. By contrast, NMT can work with patterns of characters to predict word forms it may not have previously seen.
  4. Neural systems can extrapolate across multiple languages to fill in gaps in training data. This capability called "zero-shot translation" allows NMT engines to translate language pairs for which they have no data or to fill in gaps in training data from other language pairs. For example, if a NMT engine has English<>Greek and English<>Finnish training data, but no Greek<>Finnish, it can use the information from its existing language pairs to translate that pair. Although the results will not be as good as for pairs where it has data, this can make the difference between having some translation and no translation at all.

These points all make NMT a powerful tool, but its abilities come at a cost. Neural networks require significantly more computational capacity than older statistical technologies do. They work most efficiently on graphical processing units (GPUs), the same chips used to generate on-screen images. Although an engine can run on a desktop processor, it will run many times faster on GPUs, but even so, large-scale deployments typically require more dedicated hardware than a PbSMT system would.

CSA Research finds that NMT is much more fluent than older systems – that is, its output sounds more natural than typical "machine translationese." As a result, readers find it easier to understand and interact with, but increased fluency can be a problem if it obscures problems with translation accuracy. Very often, when PbSMT engines produced incorrect results, it was obvious because they were simply unreadable. But such problems may not be obvious with NMT, and readers may not know when they have read an incorrect translation.

CSA Research also notes that typical "quality measures" like BLEU tend to perform poorly for NMT compared to statistical systems. A marked and obvious increase in perceived quality from a neural engine may not correlate to an improvement in BLEU scores. The industry has yet to develop relevant quality metrics for these systems, although research from the German Research Center for Artificial Intelligence (DFKI) into the specific types of errors seen by different MT engines is suggesting new approaches that may work better for these systems.

To date, NMT has seen a rapid uptake by machine translation developers. Google, Microsoft, and Facebook have all deployed the technology for at least some portion of their translation needs. Systran and SDL have both released NTM services aimed at the needs of enterprises and language service providers, and many smaller MT-centric companies have deployed it. At the same time, the processing requirements for these systems are decreasing as developers optimize code and find more efficient approaches. The result is that the shift to neural technology is proceeding much more rapidly than previous shifts in machine translation technology, and it is already a mainstream solution.

Because it only appeared on the scene fairly recently, companies looking at shifting to NMT can face significant difficulties in finding experts who know how to build and train these systems. Open-source versions of key software lower this burden, but training is a far more sensitive issue for neural engines, and mistakes may require a complete retraining of the engine from scratch rather than removing bad training data.

Observe or adopt?

Overall, CSA Research finds that NMT is a major advance in the state of the art and that it will rapidly displace PbSMT systems in many scenarios. Development and performance are progressing quickly, even if tech reporters consistently oversell its potential and misunderstand the fundamental technology. Enterprises should not rush to junk functioning Moses-based systems, but they should keep an eye on developments in this area and prepare to adopt it when price and performance calculations show that it will deliver an advantage. It will not replace human translators anytime soon – if anything, it will assist them as part of augmented translation workflows – but it will open up new areas for translation where today’s only alternative is no translation. As such, it should be a welcome addition for governments, support departments, and enterprises concerned about their ability to interact with citizens and customers across language barriers.