Dictionaries


win 1251
 



Dictionaries

Our experience shows, that the more carefully dictionaries in the machine translation system are customized, the higher is the system's translation quality, and the easier for users is editing or reading a resulting text. For this purpose, within our centre we created our own specialized dictionaries for topics as follows:

  • hardware & software
  • telecommunications (systems, networks, services)
  • power supply systems
  • internet
  • mechanics

Real number of dictionaries is much bigger than that of topics mentioned since, for example, for telecommunications there are several dictionaries - switching, transmission systems, intelligent networks, ATM, mobile communication etc. In addition, a creation of new dictionaries and a customization of existing ones is possible according to terminology requirements or customer's demands.

Contents of specialized dictionaries

Usually, a general lexicon which is necessary for translations is already available in the general dictionary (it is supplied together with a machine translation system). The following information is entered into specialized dictionaries:

  • basic terminology (for example, for such subjects as hardware and software, telecommunications, mechanics, power supply, etc.)
  • terminology, specific to documentation under translation (names for modules, units, devices, programs; extension of abbreviations; specific slang etc.)
  • frequently occured expressions/phrases/sentences (so called microsegments), for example:
  • it is assumed that
  • if otherwise not specified
  • note that
  • the following window is appeared

Sources of terms for specialized dictionaries

  • international recommendations and standards (ITU-T, ETSI)
  • existing dictionaries (hardcopy and electronic)
  • articles in the periodicals, containing a terminology (with explanations)
  • explanatory dictionaries with terms and abbreviations of manufacturers
  • documentation itself

Difference between general and specialized dictionaries

If you will take in your hands any general-purpose dictionary (for example, English-to-Russian, Russian-to-German, Spanish-to-Italian and so forth), you will realize that the main share of terms is made with single words (common nouns, adjectives, adverbs and verbs). A number of word collocations is always less.

The different case is with specialized dictionaries (for example, polytechnical, legal, for computer technologies and so forth). As a rule, main terms are word collocations, containing two and more words.

In a figure below, the contents of our dictionaries for machine translation systems (number of dictionary entries with one, two, three and more words) is shown in a percentage.

If in the general dictionary (general-purpose), minimum two thirds of terms are single words, so in specialized and user dictionaries main share (over 50%) is for word collocations of several words.

Dictionary customization

For a high-quality dictionary customization it is necessary to:

  • know a subject of translation
  • know basic grammar rules (of both languages)
  • continously add and update the dictionary (an ultimate goal is that the system output must be "predictable")

By its contents, dictionaries for machine translation systems are similar to conventional ones (for example, in hardcopy). However, there are specific features:

  • It is necessary to enter word collocations in a plural number.
  • It is necessary to "struggle" with homonyms. For example, the term list (as a verb and noun). If the word collocation "device list" is not entered, a phrase "device lists" can be translated not correctly by sense, but correctly by grammar. Then, it is necessary to enter into the dictionary a marked word collocation (in plural).

© "Argonaut" Ltd. 2002