Pitfalls of Machine Translation of Journal Articles
Human Translation or Machine Translation?
When confronted with a research paper written in a language other than English, it can be very tempting to use one of the free machine translators to produce a version in your native language. Although the translation is fast, even the most advanced machine translation tools can’t cope with some aspects of academic papers, such as neologisms and quotes. Recognizing what the machines struggle with when translating academic papers can help a researcher decide whether the machine translation is good enough to simply understand the gist of the paper for their purposes, or whether they need to add the expertise of a human translator with the aim of publication.
It is critical for researchers to understand the distinctions between machine and human translation, which are more than just technical. By removing a human from the translation process, many inherent pitfalls can result, specifically in the areas of navigating jargon, neologism, quotes, and graphs.
Machine translation and context
Language for a Special Purpose, or LSP, is Translation Studies jargon for ‘jargon’. Every academic discipline has its own jargon and gives certain words a meaning they don’t have outside that field.
An excellent example of this comes from the field of Religious Studies and the German and English words used to describe the cloistered life of nuns in the middle ages. The English word is “enclosure” (also: cage, box) the German is “Klausur” (also: exam, test). The machine may well describe a woman living in a cage or in an exam depending on the direction of translation. When translating the standalone sentence “Diese Klausur war nicht einfach.” DeepL provides two alternatives: “This retreat was not easy.” or “This exam was not easy.” In practice and with these terms, the researcher commissioning the translation is likely to be able to choose which of these two options applies in their circumstance.
In other situations, the first the reader may know about such alternatives is when the sentence in a machine translation does not make sense in context. One of the critical skills of a translator, that a machine is unable to perform, is selecting the most appropriate word from the list in the dictionary. They will select the correct word in the context so the translation has no sentences which do not make sense in the context.
By its very nature, research creates new terminology with its creation of new knowledge. Because the translation engines are built around existing texts, newly created words, or newly coined expressions or interpretations of existing words don’t feature in the data sets. Some translation machines don’t translate such words at all and deliver the same word in the output as was in the input, the way they would a name which could make it hard to understand in the translation. In a recent example of post-editing work from German to English, the German word “transident” consistently appeared in the English version, rather than the English term “transgender”. Google Translate (which had been used on this occasion) could not have had a reference for “transident”, so it copied it directly into the English. Linguee, the dictionary related to the DeepL machine translation tool also has no entry for “transident” although it does identify three locations where it is use and the English counterpart is “transgender”.
This transfer of an unknown term as if it were a name would not arise when using a human translator. Along with knowing what dictionary term to use, they are also familiar with emerging terms in their subject area for which a dictionary entry has not been created yet.
Tables and graphs
Tables or graphs in research papers are usually perceived as images and the text within them is not translated at all. Human translators will translate every word they are given, regardless of how it is presented. Depending on the academic discipline and language combination, an untranslated table may not cause much of a problem. Reading the translated text around it may provide enough context to understand the descriptions. In other cases, it may mean a key part of the work remains unintelligible.
Qualitative research results in the humanities often include verbatim quotes from research subjects. This also creates an obstacle for systems trained on written texts. To be able to assess whether a teenager means something is good or bad when they say “sick” for example, requires the sensibility of a human translator.
Another form of quotes, from a cited paper, may also not translate into its original. It is not unusual for quotes to be translated into the language of the paper being written. When that is then translated into the language in which the quote was originally written, the likelihood of the machine translation selecting precisely the same words as the author being cited is very low. This again is where a translator closely involved with the subject matter can provide a better translation – they are likely to locate the precise original quote.
Understand the risks before making a decision
To summarise, machine translation can help you understand more than just the abstract of a research paper. However, care needs to be taken when doing so. The extent to which the limitations of machine translation will impact the reader of the translation will depend much on the subject matter and the extent to which the above pitfalls feature in the paper. A more reliable version will always require some intervention by a human who is an expert in the academic subject matter, be it post-editing or a full translation.Receive an individualized quote!