August 2016
By Lou Cremers

Image: © tuachanwatthana/istockphoto.com

Lou Cremers ran the internal translation services of Océ Technologies until 2006 when he started Syn-Tactic, specializing in Translation Technology. Currently he also works for 8vance Matching Technologies as a language technology specialist.


Lou.Cremers[at]syn-tactic.com
http://syn-tactic.com


 


 

English as a global language

Multinational companies that export to many different countries might well spend millions of euros per year on documentation and localization. An adequate language and automation strategy helps to drastically reduce localization costs and speed up the documentation cycle. The Dutch company Océ shares its journey of improving content quality and translatability.

Some years ago, the multinational manufacturer of copying and printing equipment, Océ Technologies, faced a great challenge: how to improve documentation quality while reducing costs. Although the actual case described in this article dates back a few years, the motivations, principles, and choices are still valid.

The language factor

As our technical writers were mostly Dutch technicians creating the English source documents for user and technical service documentation, the quality of these source documents varied considerably both in style and grammar. Due to a lack of terminology management, documentation was also terminologically inconsistent. This in turn drove up localization costs. Part of the problem had been addressed by introducing a translation memory (TM) and rule-based machine translation (MT). A great deal of effort was also invested in a fully automated workflow, which allowed

  • automatic processing of translation requests including analysis of the source documents,
  • matching against all existing translation memories,
  • creating a project memory,
  • translating parts of documentation not found in the translation memories by machine translation, and
  • creating a translation package and sending it by email or FTP to a localization vendor for post-editing.

To make further improvements, we had to shift our attention to the front end of the process: the use of language in the document creation cycle. There was another compelling reason for improving the source documentation: At a later stage, an XML-based content management system (CMS) was going to be introduced. As one of the main reasons for introducing a CMS is to optimize reusability, it made sense to ensure that the CMS content would consist of generic, well-written chunks of information right from the start. In preparation, other improvements had already been introduced, such as an internal Text & Structure Course, which gave instruction on

  •   creating a clear and consistent document structure,
  •   writing generic and reusable information, and
  •   formulating concise, correct and standardized source documents.

A set of writing principles further helped to improve the general quality of content by establishing how to

  • group information into manageable chunks,
  • group similar items and exclude unrelated items,
  • label information chunks,
  • use terms consistently in chunks and labels,
  • use tables, illustrations and diagrams,
  • write generically, and
  • provide relevant topic titles.

However, the quality of the source text in terms of terminology, style and grammar remained an issue. From an organizational point of view, we had a few options:

  • Hire native English-speaking technical writers
  • Educate our existing technical writers and take them to an advanced level of English
  • Author in the native language (in this case Dutch) and have it translated into English before translating it into other languages (by TM and MT)
  • Author in a controlled language

The first option meant that existing staff with long-term experience with Océ products would have to be replaced by new inexperienced writers. This was out of the question. Educating technical writers to a superior level of English would take quite some time and financial investment. Besides, success would be uncertain: Technical skills and language skills often don’t mix well. The third option would have been feasible, allowing English to be used as a well-supported pivot language for machine translation. However, it was discarded as well: The removal of the language barrier might only encourage authors to write even longer texts and more complicated sentences. Both needed to be avoided because they would disrupt the efficiency of the machine translation. Thus, we adopted the solution of using a controlled language tool.

Controlling language

We had previously created guidelines for the authors, with a special focus on machine translation. Even simple rules can make quite a difference, as the example below shows. Inserting an article disambiguates the sentence and results in a correct translation. The guidelines could be considered as a first step towards a controlled language (CL).

Original sentence:

paint surface between lines.

MT of original sentence:

Italian: vernice di superficie tra le righe.
German: Farbeoberfläche zwischen Zeilen.

Corrected sentence:

paint the surface between the lines.

MT of corrected sentence:

Italian: verniciare la superficie tra le righe.
German: Malen Sie die Oberfläche zwischen den Zeilen

Here are some examples of the recommendations included in the guidelines:

  • Write short sentences
  • Use punctuation wherever appropriate
  • Write in the active voice
  • Write grammatically complete sentences
  • Use articles

However, the problem with guidelines was that to be effective, they had to be read, applied and finally remembered. A tool to enforce the guidelines was the next logical step.

There was a limited choice of tools for a controlled language at the time, and Océ chose the MAXit Simplified Language tool by Smart Communications. It had a plug-in for several authoring environments including Word and FrameMaker, which Océ was using at the time. MAXit was chosen because it was relatively easy to implement, maintain and use. It provided the following features:

  • It analyzed text on the basis of ten rules
  • It generated 40 different error messages
  • It assigned color codes to specific errors
  • It proposed improvements
  • It had a core, a custom and a synonym dictionary
  • It could be used (almost) "off the shelf"

The simplified language tool operated on the basis of ten general rules:

  • Write positive statements
  • Write short sentences
  • Express one thought per sentence
  • Use approved terminology
  • Write simple sentences
  • Use active voice rather than passive voice
  • Avoid gerunds and verbs used as a noun
  • Avoid conditional sentences
  • Avoid word clusters
  • Use valid abbreviations

Implementation

To prepare for the introduction of the controlled language tool, the translation department had created a comprehensive technical terminology database, which was also available to corporate Océ through the Intranet. The dictionary for simplified English was basically a subset of the terminology database containing a carefully selected set of technical domain terms.

MAXit’s simplified language rules were adopted. Only one rule was added, the "Dutch glue" rule, which alerts authors when they "glue together" single English words into compounds. Another rule was discarded to accommodate technical service manuals: to spell out numbers below ten. We set up an initial workflow to deal with terminology proposals from technical writers that were not (yet) part of the terminology base.

Hands-on classroom training explained the principles behind a controlled language and the reasons for its introduction. It taught technical writers

  • The principle that what they write is not necessarily what they mean
  • How to spot and correct common grammatical mistakes
  • That the English language can be rather ambiguous

Controlled language tends to limit the freedom of technical writers to express themselves. Moreover, it changes their way of working. We anticipated that this might lead to potential resistance towards the tool.

The classroom-training course proved to be instrumental in overcoming this initial resistance and contributed in no small part to the success of the controlled language implementation. In some cases, authors even became over-enthusiastic, in the sense that they spent too much time creating CL-compliant sentences.

Other factors favoring acceptance included:

  • Technical writers have a technical rather than a linguistic background and more easily accept the restrictions if the tool actually supports them
  • The authors were non-native speakers and more inclined to accept linguistic limitations

Failure to involve authors and translators in the whole process would have led to failure of the entire implementation or would at least have reduced its efficiency. Therefore, change management is an important factor to consider.

Positive aspects

Overall, the positive effects of the controlled language were clear:

  • Better-quality source texts through improved readability and intelligibility
  • Clear and grammatically correct source documentation (from non-native writers)
  • Improved content management system efficiency due to the re-use of already written pieces of information across products
  • Higher reuse of translated sentences in translation memories (for the same reasons)
  • A more standardized writing style
  • Consistent use of terminology and much more control over terminology than before
  • Shorter sentences and more concise writing in general
  • Reduction of superfluous information

The company decided against rewriting legacy material in controlled language. Therefore, it is difficult to calculate the influence of the controlled language with respect to the volume. In one instance, however, rewriting took place and showed an approximate reduction of 35 percent in the word count. Rewriting legacy text is fairly tedious and time-consuming, as one must get rid of all the checker’s compliancy error messages. Nevertheless, a positive side effect is that superfluous information is deleted, contributing to concise and to-the-point information.

The effects of controlled English can be seen in this example taken from one of the manuals:

Original sentence:
In several windows, an icon shows the current status/ activity of a printer. See the list below for a description of each status.

Translation of the original in German:
In mehreren Fenstern zeigt ein Ikon den laufenden Status/ Aktivität eines Druckers. Siehe die Liste zwecks einer Definition jedes Status unten

Controlled Language sentence:
These icons show the status or activity of the copier.

German machine translation of the Controlled English sentence:
Diese Ikone zeigen den Status oder die Aktivität des Kopiergeräts.






Table 1 shows some additional examples from a comparative study.

Table 1: The effects of Controlled Language on technical writing and translation.
Source: Océ Technologies

 

Negative aspects

There were also negative effects from this controlled language (or rather simplified English) implementation.
Simplified English tends to be rather constrictive and inflexible. Using the checker is initially very time-consuming: resolving all flagged errors takes time, especially in the beginning. After a while, however, the feedback leads technical writers to avoid certain mistakes instead of correcting them.
Some authors overrated the tool, blindly accepting the propositions of the checker and producing grammatically false sentences. Therefore, the training sessions had to emphasize the fact that the checker is not a miracle tool or the ultimate solution, but rather a supporting instrument.
Some authors also engaged in so-called "black authoring". They would try to correct all color-coded messages, i.e. resolve all errors flagged by the checker. This wound up taking far too much time, and sometimes the results were even grammatically wrong. Using the tool requires discipline and some basic linguistic knowledge in order to interpret the proposed corrections accurately.
In addition, the rules for controlled language were not always compatible with the rules for improving machine translation.

Controlled language and machine translation

With regard to machine translation, controlled language improved the output quality: The improved readability and the absence of ambiguity made the translation process less error-prone, both for human translators and MT systems. In general, the rules for CL also apply to MT.

However, there are also some negative effects:

  • Controlled language and machine translation have different goals and different analysis engines. Ideally, MT and CL analysis should be performed by the same engine and have the same underlying set of rules.
  • The goal of CL is to facilitate human comprehension of a text, whereas the goal of MT is to control the source for a better translation result.
  • The verb set for CL contains too many general verbs, which leads to translation problems as MT requires a detailed context. For MT, the more specific the verbs, the better the output quality.

The principles of structured writing favor the transformation of long descriptions into lists and tables. Although these can greatly enhance readability, they can also cause ambiguity due to extreme conciseness, especially in the case of a poor morphological language like English. Even if human readers usually benefit from this, it certainly presents a problem for a rule-based MT system, which relies on syntactic analysis.

Conclusion

For Océ, the implementation of a controlled language tool made a significant contribution to the reuse of both source documentation and translations, and the consistency of the technical terminology. Although Océ experienced some setbacks with the use of the controlled language, the concept has helped to save considerable time and money as part of an automated workflow in combination with several additional language and translation technologies.