Background information

The development and use of machine translation systems and
computer-based translation tools

John Hutchins

Part 3

Production of technical documentation
Controlled language and domain-specific systems

Production of technical documentation

Until the 1990s the normal assumption was that MT systems were intended to be used for the production of documentation of publishable quality, primarily but not exclusively of a scientific and technical nature. The assumption was, in other words, that MT systems were to be used in conditions where otherwise human translators would be employed with expertise in the subjects concerned. Evidently, the actual quality of MT output was inadequate for direct use. It had to be extensively revised before it could be published, and translators were therefore employed as ‘post-editors’. In these circumstances, the use of MT became a matter of economics. It was viable only if overall quality and speed could be achieved at lower cost than the employment of human translators.

Although today there are other uses for MT, as we have already indicated, this application remains the most important, particularly for the vendors and developers of the larger ‘mainframe’-type systems (Systran and Logos). The main customers and users are the multinational companies exporting equipment in the global market (Vasconcellos 1993; Brace et al. 1995). The need here is for translation of promotional and technical documentation. In the latter case, technical documents are often required in very large volumes: a set of operational manuals for a single piece of equipment may amount to several thousands of pages Furthermore, there can be frequent revisions with the appearance of new models. In addition, there must be consistency in translation: the same component must be referred to and translated the same way each time. This scale of technical translation is well beyond human capacity. Nevertheless, in order to be most cost-efficient, a MT system should be well integrated within the overall technical documentation processes of the company: from initial writing to final publishing and distribution. Systems developed for the support of technical writers – not just assistance with terminology, but also on-line style manuals and grammar aids – are now being linked seamlessly into translation and publishing processes.

There are numerous examples of the successful and long-term use of MT systems by multinationals for technical documentation. One of the best known is the application of the Logos systems at the Lexi-Tech company in New Brunswick, Canada; initially for the translation into French of manuals for the maintenance of naval frigates, the company has built up a service undertaking many other large translation projects. Also using Logos are Ericsson, Osram, Océ Technologies, SAP and Corel. Systran has many large clients: Ford, General Motors, Aérospatiale, Berlitz, Xerox, etc. The METAL German-English system has been successfully used at a number of European companies: Boehringer Ingelheim, SAP, Philips, and the Union Bank of Switzerland.

A pre-requisite for successful MT installation in large companies is that the user expects a large volume of translation within a definable domain (subjects, products, etc.) The financial commitment to a terminology database and to dictionary maintenance must be justifiable. Whether produced automatically or not, it is desirable for company documentation to be consistent in the use of terminology. Many companies in fact insist upon their own use of terms, and will not accept the usage of others. To maintain such consistency is almost impossible outside an automated system. However, it does mean that before an MT system can be installed, the user must have already available a well-founded terminological database, with authorised translation equivalents in the languages involved, or – at least – must make a commitment to develop the required term bank.

For similar reasons, it is often desirable if the MT system is to produce output in more that one target language. Most large-scale MT systems have to be customised, to a greater or lesser extent, for the kind of language found in the types of documents produced in a specific company. This can be the addition of specific grammatical rules to deal with frequent sentence and clause constructions, as well as the inclusion of specific rules for dealing with lexical items, and not just those terms unique to the company. The amount of work involved in such customisation may not be justifiable unless output is in a number of different languages.

Controlled language and domain-specific systems

In these circumstances, however, it has often been found feasible to introduce a greater degree of control. One of the earliest and best known examples is the application of the Systran system by the Xerox Corporation. At Xerox technical authors are obliged to compose documents in what is called Multinational Customized English, where not only the use of specific terms is laid down but also the construction of sentences (Elliston 1979). The advantages of this approach are: the avoidance of ambiguities in the input which the MT system cannot deal with adequately, the consequential better quality output, the faster production of technical documents simultaneously in a number of different languages, and (not least) the production of more easily comprehensible English documents. These advantages have been recognised by other multinational companies, and the use of ‘controlled languages’ is increasing: for example, the Caterpillar Corporation has devised its own form of English to facilitate translation in a knowledge-based MT system being developed for it at the Carnegie-Mellon University (Mitamura and Nyberg 1995). There are some companies offering to build ‘controlled’ language MT systems for specific clients. The oldest established – and the pioneer in this approach – is the Smart Corporation, New York. Systems have been developed by Smart for a number of major clients: Citicorp, Chase, Ford, General Electric, etc. Each incorporates a system for ‘normalising’ English documents. This system component is considered so crucial to success that the actual translation process is regarded as virtually a ‘by-product’ (Lee 1994). There are Smart systems translating into French, German, Greek, Italian, Japanese, and Spanish. The largest Smart installation, perhaps, is the system designed for the Canadian Ministry of Employment, where it has been used for many years to translate information about job advertisements and similar documentation.

In Europe, the Cap Volmac company in the Netherlands and the LANT company in Belgium offer similar services, building for various clients specialised translation systems utilising their own software for controlled languages. Cap Volmac Lingware Services is a Dutch subsidiary of the Cap Gemini Sogeti Group. Over the years this software company has constructed controlled-language systems for textile and insurance companies, mainly from Dutch to English (Van der Steen and Dijenborgh 1992). However, possibly the best known success story for custom-built MT is the PaTrans system developed for LingTech A/S to translate English patents into Danish. The system is based on methods and experience gained from the Eurotra project of the European Commission (Ørsnes et al. 1996)

These last examples of systems illustrate that a growing number of companies and organisations are developing their own MT facilities, as opposed to purchasing commercial systems. This has been a feature from early days. The successful Météo system in Canada for translating weather forecasts from English into French (and later from French into English) was effectively a customer-specific system – in this case the Canadian Environment service. It may be noted that a variant of the Météo software was successfully operated during the Olympic games in Atlanta (Chandioux and Grimaila 1996). Météo is an example of a ‘sublanguage’ system, i.e. designed for to deal with the particular language of meteorology.

Another example of a custom-built system is TITUS, a highly constrained ‘sublanguage’ system for translating abstracts of documents of the textile industry from and into English, French, German, and Spanish, in regular use since 1970. Better known are the two customer-specific systems for translating between English and Spanish built at the Pan American Health Organization in Washington – designed and developed by workers in the organisation itself. These highly successful systems (now also available to users outside PAHO) are general-purpose systems, not constrained in vocabulary or text type, although obviously the dictionaries are strongest in the health-related social science fields (Leon and Aymerich 1997).

In the 1990s there have been a number of other examples. In Finland, the Kielikone system was developed originally as a workstation for Nokia Telecommunications. Subsequently, versions were installed at other Finnish companies and the system is now being marketed more widely (Arnola 1996). A similar story applies to GSI-Erli. This large language engineering company developed an integrated in-house translation system combining a MT engine and various translation aids and tools on a common platform AlethTrad. Recently it has been making the system available in customised versions for outside clients (Humphreys 1996).

On a smaller scale, but equally successful, has been the system developed by the translation service of a small British company Hook and Hatton. In this case, the need was for translation of chemical texts from Dutch into English (Lewis 1997). The designer began by simple pattern matching of phrases, and gradually built in more syntactic analysis as and when results were justifiable and cost-effective.

Based on experience over many years in developing knowledge-based MT and experimenting with speech translation and corpus-based methods, members of the group at Carnegie-Mellon University have developed an architecture for the rapid production of usable MT systems for specific clients in some less common languages, such as SerboCroat and Haitian Creole (Frederking et al. 1997). There is no pretence of high quality, merely ‘usefulness’ for languages otherwise inaccessible.

Another example of custom-built MT in a specialised area is the program developed for TCC Communications at the Simon Fraser University for translating closed captions on television programs (Toole et al. 1998). Not only are there time constraints – translation must be in real-time – but also there are the challenges of colloquialisms, dialogue, robustness, and paucity of context indicators. The system, at present running live for English into Spanish, demanded techniques otherwise found mainly in Internet applications (see below.)

In Japan, there are further examples of custom-built systems. The Japan Information Centre of Science and Technology translates abstracts of Japanese scientific and technical articles into English. In the late 1980s it assumed responsibility of the Mu Japanese-English MT system developed at the University of Kyoto. From this, it now has one of the largest MT operations in Japan (O’Neill-Brown 1996). Other custom-built systems of significance in Japan are the SHALT system developed by IBM Japan for its own translation needs, the ARGO system developed by CSK in Tokyo for translating Japanese stock market reports into English, and the NHK system for translating English news articles into Japanese.

Previous   Next...