
Five principles of terminology management
Many companies have more or less extensive terminology lists which, in most cases, have different structures. Such lists are created mostly by employees using software meant for spreadsheet calculations. This article shows how terminology can be captured in tables and what tabular structures are supported by terminology tools.
There are five basic principles for organizing terminology in a table: 1. concept orientation, 2. elementary nature, 3. granularity, 4. autonomy of terms and 5. few mandatory fields. These points are also part of a standard that has been signed off by the Deutsche Terminologie-Tag e. V. for capturing terminology in tables. The standard is explained in the collected volume “Terminology Work: Best Practices” in the chapters “Principles and methods” and “Tools and technologies” [1].
1. Concept orientation
A terminology database is not a dictionary. Whereas a dictionary starts with the term in a given language and captures all its possible meanings, a terminological entry is based on the concept, that is, the conceptual content, to which the terms in various languages correspond. A typical “glossary”, a term which is used erroneously for many terminology lists, in most cases captures only the “permitted” terms in a source language and equates these with the “permitted” terms in the various target languages.
In this structure, however, the author can neither choose the “forbidden” terms nor manage additional information pertaining to these terms. Moreover, various translations could exist for a given term, such as the German terms “Bank” or “Kupplung”, which are homonyms.
Fig.1: A typical; “glossary list” used in companies. (graphic only available in German)
2. Elementary nature
Several pieces of information should not be included in one data category. One aspect that is extremely problematic, but is encountered quite frequently, is the annoying habit appending the country identification, gender or other information directly in the term field against the term, such as PT for Portugal or US for the USA. With additions such as these, the filters will not work properly anymore, because the column no longer contains the term alone. Even the terminology identification of a translation memory system or a program for checking terminology will not find any matches, since the column will be searched with the terms.
At the time of creating a definition, the source of the definition and the year should also be specified, as far as possible. Strictly speaking, three different data categories will be necessary for this: Definition, source of the definition and year of definition. For practical reasons, the user can definitely deviate from the principle of elementary nature at certain places.
3. Granularity
Data categories should be defined so that they are as finely granular as possible. Thus, it is better to create several categories such as “gender”. “number” and “part of speech” instead of one general category called “grammar”. Another negative example is the category “Remark”. Users tend to collect in this category everything possible that they consider important in a term or concept. In spreadsheet programs, the desire to keep things clear and uncluttered and the desire to show an entire table on just one screen page often tempts the user to indulge in simplifications. Such simplifications however make it difficult to import data into databases, which are usually finely granular in structure. However, those who refine the data consistently will reap greater benefits in the search and filter functionalities.
The source should also be kept in mind while refining the data: for instance, whether it refers to a term, a definition or a sentence pertaining to the context. Even if the source of the term and the source of the definition or context sentence are usually identical when the terminology is gleaned, this could change as the terminological entry undergoes further processing.
Fig. 2: Violations of the principle of elementarity, granularity and autonomy of terms
4. Autonomy of terms
One of the more recent requirements placed on terminology work is to capture synonyms as equivalents. For every term, a complete set of data categories describing it should be available. It is only in this way that terminology that is still in the consensus stage, that is, which has not yet been classified as “allowed” or “banned” can be managed at all.
For this, it is important to use a separate row for every term, including abbreviations, in which the subsequent data categories can be filled if necessary. An additional column with the language code is introduced for managing multilingual data.
Fig. 3: The three levels of the concept, language and term in the CSV-Import/Export in conceptXplorer
5. Avoid defining too many mandatory fields
Some data are difficult to specify, such as the definition and the technical domain. Others, on the other hand, are easier to fill: in German, for instance this is applicable to the gender (masc., fem., neutr.), number (singular, plural) and part of speech (noun, verb, adjective).
However, this should not become a temptation to fill the data categories that can be researched easily for all entries. What is important is to be able to use these data categories in cases of doubt, such as for brand names in German (der/die/das Nutella) and verbs and nouns that are identical in English (cut – to cut).
If all these details are taken into account, we get an Excel list which is a) multilingual, b) concept oriented, c) autonomous with respect to terms, d) pragmatic and e) complies with the principles of elementary nature and granularity.
CSV in TermStar
TermStar/WebTerm is the terminology management component in Star, manufacturer of the translation memory system Transit. In the latest version of “Transit NXT”, the component supports the XML-based exchange format TBX, in the predecessor version “Transit XV” the format was MARTIF (SGML-based).
To import or export tables that are created in a concept-oriented manner with term autonomy in the CSV-format, however, the user should use the exchange format TermStar 2.6. In this older export and import format, the data categories that have been specified in a fixed manner by Star also follow a strict sequence.
First, all the information pertaining to the concept is listed in one row containing a blank language code “[]”. Information at the level of the term, namely, a term and other details such as the usage or context, are classified in one row, along with the concerned language code. The information pertaining to the concept and the various terms in a given language (= synonyms) or in several languages (= equivalents) are kept together by means of the concepts-ID in this import/export format.
To this extent, it is possible to exchange tabular lists that are conceived in a concept oriented and term-autonomous manner using TermStar. Unfortunately, the double entry of columns (concept related data categories in row 1, term related ones in row 2) makes it impossible to use the excellent filter functions of spreadsheet applications.
Here, it would be better to separate the data categories of the concept level (shown yellow and bright yellow in Figure 4) from those at the level of the term (shown green and light green in Figure 4) and to precede these as columns.
Fig. 4: Simplified representation of the export/import in the TermStar 2.6 format
CSV in conceptXplorer
In the conceptXplorer of infolox, the data categories of the concept level, followed by those of the language and then the term level, are requested during the import/export of a CSV-file as required by the DTT-standard: Figure 3. One advantage of this concept is the complete representation of the complex terminology database contents in tables, which meet all the requirements of the modern terminology management.
One disadvantage of this concept is that the data categories that are applicable at the level of the concept and language, which are of course applicable to different terms, actually have to be replicated several times. The user should, however, resist the temptation to just link these cells for the sake of clarity, because in general, tables with linked cells cannot be imported correctly.
CSV in crossTerm
The export and import function in the terminology management component crossTerm of Across is similar in structure to conceptXplorer. Here, again, the data categories that can be defined freely by the user, are arranged one after the other at the level of the concept and term, and the terms are assigned to each other through concept-IDs. However, this applies only to synonyms, or terms in one language, but not to equivalents, or terms in other languages.
In our case, this could result in a situation where blank rows appear if the number of terms in the various languages is different. In our example shown in Figure 5, for instance, the concept having the ID 1 has three synonyms assigned to it in German, but only two in English. Hence, in row 4, there are no more terms in English. Furthermore, complex lists with 27 or more languages and with several attributes are hardly manageable in this representation, except in the case of a bilingual translation perspective.
Fig. 5: Simplified representation of the CSV-Export/Import in crossTerm
CSV in MultiTerm
MultiTerm in SDL Trados offers a “tabulator separated” export interface. This is definitely limited in its function, since it does not export any blank records. Only the filled fields in each row will be exported; as a result, the contents of the data categories may no longer match in the various rows.
It is therefore better to create separate export definitions in MultiTerm, which can be used to export data categories. Without specific scripts that merge the various exports, however, this cannot be completed in one single export transaction. In the latest version of MultiTerm 2009, it is now possible to include the entry number for XLS-files as well through MultiTerm Convert in the import and hence synchronize using the entry numbers.
Summary
Terminology can be managed so as to conform to the latest standards even using Excel. The requirements for setting up these tables are basically the same as those that apply to the conceptualization of terminology databases in modern terminology management systems. However, not all applications implement the standard for import and export formats to the same extent. This is because very often, the CSV-Export in particular is one of the “oldest” formats in these tools and usually still follows a dictionary concept. However, this can be changed.
Links
[1] Deutscher Terminologie-Tag: www.dttev.org







Since 2004, Dr. Rachel Herwartz has been leading the company “TermSolutions – Terminology Solutions & Services”. The company provides consultancy to clients from the industry and medium sized companies with regards to terminology management and translation. Dr. Herwartz is a lecturer at the Donau-Uni Krems Hochschule and at the Zürcher Hochschule Winterthur. Her lectures and technical articles deal with issues related to terminology.