Kenny - Ed - 2022 - MT For Everyone
Kenny - Ed - 2022 - MT For Everyone
for everyone
Empowering users in the age of
artificial intelligence
Edited by
Dorothy Kenny
language
Translation and Multilingual Natural science
press
Language Processing 18
Translation and Multilingual Natural Language Processing
7. Hansen-Schirra, Silvia, Oliver Czulo & Sascha Hofmann (eds). Empirical modelling of
translation and interpreting.
8. Svoboda, Tomáš, Łucja Biel & Krzysztof Łoboda (eds.). Quality aspects in institutional
translation.
9. Fox, Wendy. Can integrated titles improve the viewing experience? Investigating the impact
of subtitling on the reception and enjoyment of film using eye tracking and questionnaire
data.
10. Moran, Steven & Michael Cysouw. The Unicode cookbook for linguists: Managing writing
systems using orthography profiles.
12. Nitzke, Jean. Problem solving activities in post-editing and translation from scratch: A
multi-method study.
15. Tra&Co Group (ed.). Translation, interpreting, cognition: The way out of the box.
18. Kenny, Dorothy (ed.). Machine translation for everyone: Empowering users in the age of
artificial intelligence.
ISSN: 2364-8899
Machine translation
for everyone
Empowering users in the age of
artificial intelligence
Edited by
Dorothy Kenny
language
science
press
Dorothy Kenny (ed.). 2022. Machine translation for everyone: Empowering users
in the age of artificial intelligence (Translation and Multilingual Natural
Language Processing 18). Berlin: Language Science Press.
ISSN: 2364-8899
DOI: 10.5281/zenodo.6653406
Source code available from www.github.com/langsci/342
Errata: paperhive.org/documents/remote?type=langsci&id=342
Introduction
Dorothy Kenny v
Index 208
ii
Acknowledgments
This book has been written with the support of the European Union’s Eras-
mus+ strategic partnership programme, as part of the project known as
“MultiTraiNMT: Machine Translation training for multilingual citizens” (project
ID: 2019-1-ES01-KA203-064245).
Introduction
Dorothy Kenny
Dublin City University
In this Introduction I set out the rationale for this book and suggest ways in which
readers might approach the material it contains.
casually in language learning contexts; and people who are either already work-
ing as translators or training to become translators. We take the view that all
users of machine translation should have some basic understanding of why the
technology is important, and where it fits into the maintenance of multilingual
regimes. And all users should have some basic understanding of how the tech-
nology works, so they can use it intelligently and avoid common pitfalls. Some
users, who may wish to engage more deeply with the technology, may bene-
fit from knowing how to get the best out of machine translation, for example,
by writing texts in a way that makes them easier to translate by machine. The
same users might also be interested in ways to improve machine translation out-
puts. Those working in, or about to join, the translation industry, will have a
particular interest in evaluating machine translation output, in order to gauge
whether it is “fit for purpose”. They might even get involved in integrating ma-
chine translation into the workflow of their company or need to know how to
customize machine translation so that they can better serve the needs of particu-
lar clients. They will also be interested in how machine translation might impact
on their working conditions. Such readers require more in-depth knowledge of
the technology itself, and of the techniques and tools they can use to implement
it. All users should have some basic knowledge of the ethical issues that arise
when we use machine translation, for different reasons. Some users may be con-
cerned about the possibility of cheating: in what cases might the use of machine
translation constitute a breach of trust in educational environments, for exam-
ple? Others, mainly professional translators, may have to consider how the use
of certain types of machine translation might constitute a breach of contract.
And everybody has to be concerned these days about protecting the privacy and
data rights of others. Contemporary machine translation is also one of the many
technologies that can be implicated in processes that degrade our natural envi-
ronment. And it has been known to produce biased outputs, preferring male to
female forms, for example. Like all communication technologies, it can be used
for nefarious causes or positive humanitarian purposes. These are issues that
concern us all.
vi
Introduction
vii
Dorothy Kenny
Accompanying resources
Each chapter of this book is accompanied by a set of interactive activities acces-
sible through the MultiTraiNMT website at http://www.multitrainmt.eu/. A per-
manent link to these activities can be found at https://ddd.uab.cat/record/257869.
Most of the activities can be completed on a self-access basis, although some will
benefit from the guidance of a teacher.
A special pedagogical platform known as MutNMT has also been created as
part of the MultiTraiNMT project. It is designed to help users learn how to
train, customize and evaluate neural machine translation systems. It is accessi-
ble through the MultiTraiNMT website, and will be of particular significance to
readers of Chapters 7 and 8 of this book.
References
Bowker, Lynne & Jairo Buitrago Ciro. 2019. Machine translation and global re-
search. Bingley: Emerald Publishing.
viii
Chapter 1
Europe, multilingualism and machine
translation
Olga Torres-Hostench
Universitat Autònoma de Barcelona
This chapter explains multilingualism as a foundational principle of the European
Union, describing how it is put into practice and supported through language learn-
ing and translation. Taking the university campus as a case study, it argues that
machine translation can be used to foster multilingualism in this context.
1 Introduction
The European Union’s motto “united in diversity” is said to symbolize “the essen-
tial contribution that linguistic diversity and language learning make to the Euro-
pean project” (European Commission 2021). But European Union (EU) policy on
multilingualism is mostly built upon language learning and mobility, both time-
consuming activities. And human language learning presents particular chal-
lenges. After all, there is a limit to the number of languages the average EU
citizen can learn. The aim in this chapter is to suggest answers to these ques-
tions by arguing that machine translation can contribute to the promotion of
multilingualism in Europe and thus to European linguistic diversity.
2 A multilingual EU
It is … an open secret that the EU’s supposedly humane multilingualism is
but an illusion. (House 2003: 561)
languages. As of 30 January 2020, the standard contained entries for 7,868 lan-
guages (Wikizero 2020), around 600 of which are spoken in Europe, and 24 of
which are official languages of the EU. These are: Dutch, French, German, Italian
(since 1958); Danish, English (since 1973); Greek (since 1981); Portuguese, Spanish
(since 1986); Finnish, Swedish (since 1995); Czech, Estonian, Hungarian, Latvian,
Lithuanian, Maltese, Polish, Slovak, Slovene (since 2004); Bulgarian, Irish, Roma-
nian (since 2007) and Croatian (since 2013).
Linguistic diversity is part of Europe’s cultural heritage. In Europe, there are
languages with official status at state level, and indigenous regional and/or mi-
nority languages with different degrees of recognition. The 1998 European Char-
ter for Regional or Minority Languages is the European convention for the protec-
tion and promotion of languages used by traditional minorities. It was reformed
and strengthened by a monitoring mechanism in 2019. The Charter covers 79 lan-
guages used by 201 national minorities or linguistic groups (Council of Europe
2020). They are presented in alphabetical order in Table 1.
According to the Charter, some of these languages are to be protected in just
one country, such as Skolt Sami in Finland, whereas others should be protected in
several countries, such as Slovenian in Austria, Bosnia and Herzegovina, Croatia
and Hungary. Beyond the Charter, there are other languages with different levels
of recognition. For instance, Sardinia, an autonomous region of Italy, recognizes
the Sardinian language as an official language, and Romansh Ladino, Cimbrian
and Mocheno, spoken in certain communes of the mountainous North of Italy,
also have local recognition.
The European Charter for Regional or Minority Languages, however, guar-
antees the rights only of regional minority groups, and not of migrant groups.
What’s more, the Charter has noteworthy absences, such as Breton, spoken in
the North West of France, although a Breton language agency was created by the
Region of Brittany in 2010 to promote daily use of the language.
Multilingualism in Europe is also enhanced by immigration and mobility.
There have been intra-European migrations, leading, for example, to Portuguese
being spoken in Andorra and Polish in Ireland, alongside languages tradition-
ally spoken outside the EU, such as Mandarin Chinese or Arabic. In the Multi-
lingual Cities Project (Extra & Yagmur 2005), home language surveys amongst
pupils both in primary and secondary schools were collected in Brussels, Ham-
burg, Lyon, Madrid, The Hague and Göteborg. The list of collected languages was
the following: Romani, Turkish, Urdu, Armenian, Russian, Serbian/Croatian/Bos-
nian, Albanian, Vietnamese, Chinese, Arabic, Polish, Somali, Portuguese, Berber,
Kurdish, Spanish, French, Italian, English, German. The authors of the study
reached an obvious but provocative conclusion:
2
1 Europe, multilingualism and machine translation
3
Olga Torres-Hostench
recognize national and regional minority languages, and language policies are
highly controversial.
Meanwhile, the EU prides itself on standing up for language diversity through
the use of the 24 official languages in the main EU institutions. From a practi-
cal point of view, this position involves a major challenge that deserves closer
attention.
For instance, in the European Parliament, parliamentary documents are pub-
lished in all the official languages “as EU citizens must be able to read legislation
affecting them in the language of their own country” (European Parliament 2020)
and members of the European Parliament have the right to speak and write in
any of the official languages. Rule 167 of the Rules of Procedure of the European
Parliament is related to languages, and specifies that: (i) all documents of Par-
liament shall be drawn up in the official languages; (ii) all members shall have
the right to speak in Parliament in the official language of their choice; (iii) in-
terpretation services shall be provided and (iv) the President of the Parliament
shall rule on any alleged discrepancies between the different language versions
(European Parliament 2021).
As for the citizens of the EU, according to the Treaty on the Functioning of the
European Union (TFEU1 ), all European citizens have the right to address the offi-
cial EU institutions in any of the EU’s official languages and to receive an answer
in that language. This is intended to make the EU institutions more democratic
and accessible to EU citizens. Other provisions related to multilingualism in the
TFEU are contained in articles 20, 24 and 342.
Some people think that 24 official languages is too many, and others that 24
official languages is not enough. Some countries try alternative approaches. For
instance, Catalan, Euskara and Galician, all spoken in Spain, are considered “ad-
ditional languages” by the EU (they are co-official languages together with Span-
ish in their respective territories). This status means that any communication
from an EU citizen in these languages has to be translated in Spain into a “pro-
cedural language” of the EU, and the answer from the EU institution will be also
translated from the procedural language into the additional language. The cost
of these translations is borne by Spain.
The use of three procedural languages, English, French and German, is in-
tended to simplify multilingual communication in the EU: given 24 official lan-
guages, the EU is faced with a total of 552 possible translation combinations,
“since each language can be translated into 23 others” (European Parliament
1
The most recent, consolidated version of the TFEU is available from https://eur-lex.europa.eu/
legal-content/EN/TXT/PDF/?uri=CELEX:12012E/TXT&from=EN. Unless otherwise indicated,
all urls mentioned in this chapter were more recently accessed in January 2022.
4
1 Europe, multilingualism and machine translation
2020) and this would be difficult to handle for all EU documentation. For this rea-
son, there are norms to establish which documents are translated into the other
23 languages and which are translated just into the three procedural languages.
The European Commission’s Directorate-General for Translation (DGT) trans-
lates texts for the institutions and the citizens of the EU. As of 2022 it produces
more than 2.75 million translated pages per year, 91% of which are translations
from English, 2% from French, just under 1% from Spanish, and slightly less again
from German. Other source languages combined account for around 5% of trans-
lation activity. Of all translated documents, 63% are translated internally by the
DGT and 37% are outsourced to external companies. Some 55% of translations
involve EU law-making, 22% external communication and the web, 12% commu-
nication with other EU institutions and national parliaments, 5% correspondence
with EU citizens, 4% other official documents, and 2% public consultation on EU
policies. The translation budget for 2022 was 355 million euros, or 0.2% of the
whole EU budget (European Commission, Directorate-General for Translation
2022).
6
1 Europe, multilingualism and machine translation
Discussion topic
Interestingly, the above-mentioned report from the High Level Group on Mul-
tilingualism, which provided our definition of multilingualism, mentions “the
potential of multilingual electronic tools as support for non-specialist users of
second and third languages” (European Commission 2007) as a research area.
Likewise, the European Commission’s communication on “Multilingualism – an
asset and a commitment” (European Commission 2014) claims that “the language
gap in the EU can be narrowed through the media, new technologies and trans-
lation services”. This book aims to make a contribution precisely to this field.
3
https://ec.europa.eu/education/policies/linguistic-diversity_en
7
Olga Torres-Hostench
Discussion topic
8
1 Europe, multilingualism and machine translation
language learners/users lie at the heart of the work of the Language Pol-
icy Programme. Whatever their status, all languages are covered: foreign
languages, major languages of schooling, languages spoken in the family
and minority or regional languages, as well as a specific programme on the
linguistic integration of migrants and refugees.
Initiatives to foster multilingualism are many and varied, but language learning
deserves closer attention, especially given the EU’s above-mentioned “mother
tongue plus two” policy.
Some of the EU’s recent initiatives to improve language skills include the Eu-
ropean Centre for Modern Languages (www.ecml.at; the Eurydice Report (Eury-
dice 2019), which provides information on policy efforts in Europe that support
the teaching of regional or minority languages in schools; the Online Linguistic
Support (OLS) platform;8 the Common European Framework of Reference for
Languages (CEFR); and Erasmus+ mobility programmes.
European projects funded to improve language learning deserve special atten-
tion. Methodologies, languages and countries involved vary enormously from
one project to another. Table 2 lists some interesting examples.
Eurostat, the website for European Statistics,9 provides statistics on the sec-
ond and foreign languages studied by pupils at different education levels in the
EU. According to Eurostat data from 2019, English was by far the most popular
language at lower secondary level, studied by nearly 86.8% of pupils, followed by
French (19.4%), German (18.3%) and Spanish (17.5%) (Eurostat 2022).
Discussion topic
Another interesting question is how many students learn two or more foreign
languages, as recommended by the European Council (2002): it is known that
7
https://www.coe.int/en/web/language-policy/home
8
https://erasmusplusols.eu/en/about-ols/
9
https://appsso.eurostat.ec.europa.eu
9
Olga Torres-Hostench
10
1 Europe, multilingualism and machine translation
89.9% (almost 14 million) of secondary level pupils studied more than one for-
eign language in 2019 (Eurostat 2022). Among them, more than 7 million (48.1%)
studied two or more foreign languages.
In short, of the 600 languages spoken around Europe by more than 700 million
speakers (EU and non-EU) (World Bank 2020), the majority of EU students are
learning one or two out of the following four as their first or subsequent foreign
language: English, French, German or Spanish.
This indicates that the survey respondents were not satisfied with the level
they achieved at the end of compulsory education or they did not have a
chance to maintain their level. One third of surveyed young Europeans said
they were unable to study in a language other than the one they used in
school (i.e. often the mother tongue). (European Commission 2019: 102)
11
Olga Torres-Hostench
language learner, it could, in theory, be used to help learners read complex texts
and develop more advanced written skills in their second language. They could
learn how to make the most of machine translation in the second language so that
they could detect and edit machine translation mistakes based on their knowl-
edge of the second language. And while empirical studies of the use of machine
translation in language classes are still thin on the ground, a small number of
sources suggest interesting avenues of research. Relevant studies are discussed in
Carré et al. (2022 [this volume]), which includes further ideas and strategies that
can be used in language learning classes. Yet others are included in the database
of activities of the MultiTraiNMT project (MultiTraiNMT 2020). My view is that
there are many ways of using machine translation in language learning classes
and there is no need to forbid its use if it is used in a conscious and critical way.
On occasion, however, there is just no time to train second language students.
Indeed, in the history of machine translation there have been many occasions
on which research was partially triggered by a perceived lack of people learning
a particular foreign language. Cold War research into Russian-English machine
translation is one such case (Gordin 2016). More recently, the Japanese organiz-
ers of the Tokyo 2020 Olympic Games (actually held in 2021 due to COVID-19)
realized that learning Japanese was out of the question for most foreigners and
that they needed a faster approach to overcome language barriers during the
Olympics. The Japanese internal affairs ministry thus allocated ¥1.38 billion to
machine translation research to improve the quality of real-time speech trans-
lation technology, with the aim of covering 90% of the language needs of the
Olympic teams and tourists who, it was hoped, would go to Japan (Murai 2015).
The Japanese government funded the research for a specific machine translation
system to be used during Tokyo Olympics and private companies were tasked
with the development of devices and mobile apps to run the system. The plan
was that companies would recover their investment by selling the devices and
apps subscriptions to users. In this case, the introduction of machine translation
was the chosen shortcut to bring multilingualism to Japan, instead of language
learning.
12
1 Europe, multilingualism and machine translation
13
Olga Torres-Hostench
14
1 Europe, multilingualism and machine translation
• Review: Assess and enhance the quality, impact, and progress and
15
Olga Torres-Hostench
16
1 Europe, multilingualism and machine translation
• International students could mix with local students, and have machine
translation resources available to follow the class in any local language
as there would be: (i) teaching materials in different languages (assuming
copyright issues have been resolved); (ii) access to multilingual glossaries
and databases for specialized terminology; (iii) a recording of the class us-
ing available voice recognition, transcription and machine translation fea-
tures, etc.
17
Olga Torres-Hostench
4 Conclusion
This chapter has championed the idea of machine translation as a tool to foster
multilingualism in Europe. As seen in the chapter, the EU has published charters,
treaties and parliamentary documents promoting multilingualism as a core value
in Europe that has to be fostered and preserved. However, despite all efforts and
resources put into language learning, the goal of learning one’s “mother tongue
plus two” is difficult to reach. On the one hand, in practice, most EU citizens
are learning only English as a foreign language. On the other hand, the learning
curve in language learning is long and slow. In this context, machine translation
seems to offer some support to those who do not have the time or resources to
keep learning more and more languages.
The chapter also explores the case of universities as small multilingual commu-
nities who can design language policies that promote multilingualism. Language
policies may generate tensions on campuses for a number of reasons, but most
campuses are multilingual in practice nowadays, either through the internation-
alization/Englishization of the university or due to the arrival of foreign students.
In this context, the chapter explores the need to design language policies that
acknowledge the potential of machine translation to facilitate multilingualism,
without forgetting the challenges that machine translation presents, especially
those related to quality and ethics. As we say in Spanish, my aim here is to abrir
el melón (literally to ‘open the melon’) of machine translation in multilingual-
ism and language learning. Opening the melon means tackling a question that
needs to be dealt with sooner or later, although nobody wants to do it because
the consequences are unknown. In other words, nobody knows if the melon will
be sweet enough to eat, but there is only one way to find out. Even if existing ma-
chine translation systems do not communicate the non-literal meaning of abrir
18
1 Europe, multilingualism and machine translation
el melón, anyone reading a literal machine translation will still learn a useful
Spanish metaphor. And who knows? This metaphor may even travel to new lan-
guages and cultures, as it allows a long and complex meaning to be conveyed in
just three words. This is multilingualism in action.
References
Carré, Alice, Dorothy Kenny, Caroline Rossi, Pilar Sánchez-Gijón & Olga Torres-
Hostench. 2022. Machine translation for language learners. In Dorothy Kenny
(ed.), Machine translation for everyone: Empowering users in the age of artificial
intelligence, 187–207. Berlin: Language Science Press. DOI: 10 . 5281 / zenodo .
6760024.
Cenoz, Jasone. 2013. Defining multilingualism. Annual Review of Applied Linguis-
tics 33. 3–18. DOI: 10.1017/S026719051300007X.
Council of Europe. 2020. Languages Covered by the European Charter for Regional
or Minority Languages. https : / / www . coe . int / en / web / european - charter -
regional-or-minority-languages/languages-covered.
Council of the EU. 2008a. Council conclusions of 22 May 2002 on multilingualism
(2008/c140/10).
Council of the EU. 2008b. Council resolution of 21 November 2008 on a European
strategy for multilingualism. OJ C 320 16.12.2008. 1–3.
Council of the EU. 2011. Council conclusions on language competences to enhance
mobility 2011. https://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:C:
2011:372:0027:0030:EN.
Council of the EU. 2014. Conclusions on multilingualism and the development of
language competences. https://www.consilium.europa.eu/uedocs/cms_data/
docs/pressdata/en/educ/142692.pdf.
European Commission. 2007. Final report from high level group on multi-
lingualism. europa.eu/en/publication-detail/-/publication/b0a1339f-f181-4de5-
abd3-130180f177c7. ISBN: 978-92-79-06902-4 https://op.
European Commission. 2008. Multilingualism: an asset for Europe and a shared
commitment. Communication from the Commission to the European Parlia-
ment, the Council, the European Economic and Social Committee and the
Committee of the Regions. https : / / eur - lex . europa . eu / legal - content / EN /
TXT/PDF/?uri=CELEX.
European Commission. 2013. European higher education in the world. Commu-
nication from the Commission to the European Parliament, the Council, the
European Economic and Social Committee and the Committee of the Regions.
COM (2013) 499. https://ec.europa.eu/transparency/regdoc/rep/1/2013/EN/1-
2013-499-EN-F1-1.pdf.
19
Olga Torres-Hostench
20
1 Europe, multilingualism and machine translation
21
Chapter 2
Human and machine translation
Dorothy Kenny
Dublin City University
This chapter introduces the reader to translation and machine translation. It at-
tempts to dispel some myths about translation, and stresses the importance of
translators in creating equivalence between source and target texts. Ultimately,
the chapter aims to help readers construe human-produced translations as training
data for machine translation. The chapter goes on to present some of the most use-
ful distinctions made in machine translation: between types of machine translation
systems and different uses of machine translation output. In particular, it attempts
to explain contemporary machine translation as an application of the branch of
artificial intelligence known as machine learning, and, more specifically, of deep
learning.
1 What is translation?
This is a book about machine translation, which can be succinctly defined as
translation performed by a computer program. This definition still leaves open
the question, however, of what translation is. The reader should be made aware,
at this point, that there is a vast amount of scholarship in the area known as
translation studies that asks precisely this question, and that tracks the role of
translation in diverse cultural, scientific and political arenas, to name just a few. It
would be impossible to do justice to this rich field here, and the reader is referred
instead to sources such as Baker & Saldanha (2020) for further information. We
will content ourselves here by saying that most commentators would agree that
translation is the production of a text in one language, the target language, on
the basis of a text in another language, the source language. The notion of text is
important. It refers to instances of real language use, whether spoken or written.
In general, we expect texts to meet certain criteria: they should be coherent and
“hang together” properly; they should serve some kind of purpose, even if it is just
to say “hello” to someone. We also usually have particular expectations regarding
what texts will or should be like, given the particular language and context. This
chapter, for example, hopefully meets the reader’s expectations of a chapter in a
collected English-language volume that is designed to be used as a textbook. It
addresses a particular subject field or domain, namely machine translation, and
adopts the conventions of a particular genre, that of a textbook.
The idea that translation involves texts is old hat to anyone who works in
the area; it is so obvious that is doesn’t need to be said. But in a world where
most people don’t think too much about translation, it is worth reminding our-
selves that we translate texts and not languages. Languages are vast, complicated,
abstract systems that are put to use in potentially infinite examples of human
communication and expression. Texts are concrete instances of language in use.
They normally have recognizable beginnings and endings, and even if individual
languages seem to offer endless potential for creating sometimes unpredictable
meanings and high levels of ambiguity, in any given text much of that potential
simply falls away. It does not matter, for example, that shower in English can
mean (1) a brief period of rain, (2) a device used for personal washing, or (3) a
gift-giving party, all of which would be translated differently into a language
like French, if what we are doing is translating a shower installation guide for a
manufacturer of bathroom fittings. Unless the author is engaging in some witty
wordplay, which is unlikely given the genre, we are dealing with the second
meaning of shower. Focusing on texts rather than languages keeps things real,
and manageable.
A second element of the definition of translation given above is the contention
that translation involves the production of a text on the basis of another, pre-
existing text. This clearly establishes translation as involving a relationship be-
tween two texts, commonly known as the source text and the target text.1 Some
commentators would go further than this and say that the relationship in ques-
tion is one of having the “same meaning”, but many philosophers and linguists
– who understand meaning admittedly in quite sophisticated, technical ways –
tend to shy away from claims of “same meaning” in translation. One reason for
doing so is that it can be difficult to isolate the meaning of a text from the situ-
ations in which it is created and used. We might consider the meaning of a text
1
A third element of our definition, of course, relates to the fact that source and target texts are
in two different languages. We are thus concerned with interlingual translation. Some com-
mentators, most notably Jakobson (1959), have recognized other types of translation, such as
intralingual and intersemiotic, but a discussion of these categories is beyond the scope of this
chapter.
24
2 Human and machine translation
to be what its writer or speaker wanted to say, but often we cannot be sure what
they intended. Or we can associate meaning with our own interpretation of a
text, but then we have to concede that other people might interpret the same
text in a different way. A further issue that arises in the context of translation is
that a perfectly valid target text may say more or less than its source text, simply
because the language it is written in requires it to do so.
An example might help here. The opening line of a fairly recent memoir (Tam-
met 2006) is reproduced in example (1):
Despite almost total word-for-word alignment between the two sentences, the
French sentence actually says more than the English. It tells the reader that the
writer, the I in English, is male, because if the writer was female, then the correct
form in example (2) would be née and not né. Given certain tense forms, involving
certain verbs, written French is obliged to signal the sex of the person in question.
But how does the translator into French know that the person saying “I” is
male? This is, after all, the opening line of the book. Well, the book is a memoir,
and the conventions of the genre require the enunciating subject to be the author
of the memoir, and the translator knows whose book he is translating. It says it in
the contract and on the cover of the book. The fact that French needs to specify
the sex of a person in certain situations where this can be left vague in English
does not cause the translator any headaches. It is a non-problem; but this very
simple example shows two important things: the first – already mentioned – is
that sometimes translations can mean more than their source texts. The second
is that sometimes information that is required to translate a sentence cannot be
found in that sentence. Rather one has to look into (1) the wider text – the front
cover, for example – which is sometimes also called the co-text, the text that goes
with a given fragment of text, or (2) the context, understood here as the wider
situation that is relevant to the text, to find out how to proceed.
In other cases, a translation might say more than its source text not because
the target language requires it, but because the genre does. In a study involving
user interfaces for computer-aided design tools, Moorkens (2012) found that the
single-word heading Selecting in English was commonly translated in a way that
made explicit what was to be selected, yielding a variety of different translations,
a sample of which is presented below, back-translated into English:
25
Dorothy Kenny
and so on.
This kind of explicitation, which results in translations saying more than their
source texts, is not uncommon. The converse can also happen of course; in cases
where it would be impossible or unusual for a target text to be as explicit as its
source text, the translator can choose to leave out information. This can some-
times happen for language-typological reasons. For example, English belongs to
a group of languages that frequently use verbs to describe the manner in which
something or someone moves. Spanish, on the other hand, tends to use verbs
to describe the path that is followed; it can encode the manner of motion in an
adverbial phrase, but sometimes translators into Spanish will choose not to refer
to manner of motion at all, as to do so would give it undue prominence, from the
Spanish point of view. Slobin (2003) gives examples (7) and (8), by way of illus-
tration. While the verb “stomped” in the English describes a way of walking in
which the feet strike the ground heavily and noisily, the verb in Spanish “salió”
simply captures the fact that the character in question has left the house.
There is a second way in which the Spanish sentence in (8) says less than
its English counterpart in (7): the Spanish does not contain a subject pronoun
equivalent to “he”. This is because Spanish is predominantly a pro-drop language,
meaning it can happily omit subject pronouns as most of the information they
contain is available anyway from the ending on the verb in question, in this
case, “salió”, which indicates third-person singular, past tense. What’s missing in
Spanish but present in English is, of course, the gender of the subject. A reader
of the Spanish text will, however, carry over knowledge of the (male) subject
from the earlier co-text, and so they are not left in the dark. So by omitting the
pronoun in Spanish, the translator has followed the norms of the target language
and done no harm to the reader’s ability to know what is going on in the novel.
The arguments and examples given above are intended to explain why so
many scholars are reluctant to say that a source and target text have the same
26
2 Human and machine translation
meaning. What we are more likely to agree on is the idea that translations ap-
proximate their source texts. For all sorts of reasons, translators have to make
decisions about what to prioritize when translating, what they need to say and
what they should leave up to readers to work out for themselves.2 The mean-
ings that they help target-text readers to construct for themselves are likely to
be compatible to a very large extent with the meanings that source-text readers
construct, but in many cases they will not be identical. And that is generally not
a problem.
But if we cannot call the relationship between a source text and a target text –
or more probably snippets of such texts – one of “same meaning”, then what can
we call it? One answer is to call this relationship one of equivalence. Equivalence
as a term has a chequered history in translation studies, but if it is understood as a
relationship that emerges from the decision-making of a translator, a relationship
that arises between two text snippets because the translator has deemed them to
be of equal value in their respective co-texts and contexts, then equivalence can
be a perfectly serviceable term. It allows us to say things like “salió” in example
(7) is equivalent to “he stomped” in example (8). This equivalence is clearly not
fixed for all eternity, and it certainly cannot be generalized to all other contexts in
which the word “stomped” might appear, but this does not matter, if we concede
that “salió” was a fair exchange for “he stomped” in this particular case.3
27
Dorothy Kenny
28
2 Human and machine translation
29
Dorothy Kenny
You might ask why it is so important to sketch how human translators work
in a book about machine translation. The answer is twofold: firstly, in a very
real way (elaborated upon by Rossi & Carré 2022 [this volume]) human trans-
lation sets the standard by which machine translation is judged, and anything
that contributes to the maintenance of high quality in human translation is ul-
timately of relevance to machine translation. Likewise, human translation pro-
cesses can help to put into sharp relief occasional deficits in machine translation.
Human translation has a role to play, in other words, in both the evaluation of
machine translation output and in the diagnosis of problems in that output. Sec-
ondly, and even more crucially, most contemporary machine translation relies
on translations completed by humans to learn how to translate in the first place.
This point is expanded upon below.
Before we close off our discussion of how human translators work however,
we need to introduce a technology that has become indispensable for many trans-
lators: translation memory.
4 Translation memory
In the 1990s translators working in the burgeoning software localization industry
found themselves translating texts that were either extremely repetitive in them-
selves or that repeated verbatim whole sections of earlier versions of a document.
This was the case, for example, with software manuals that had to be updated any
time there was a new release of the software. Rather than translate each sentence
from scratch, as if it had never been translated before, they invented a tool that
would store previous translations in a so-called translation memory, so that they
could be reused. The tool, known as a translation memory tool, would take in
a new source text, divide it into segments – sentences or other sensible units
like headings or cells in tables – and then compare each of these segments with
the source-language segments already stored in memory. If an exact match or
a very similar segment was found, then the corresponding target-language seg-
ment would be offered to the translator for re-use, with or without editing. As
translators worked their way through a new translation assignment, they would
get hits from the translation memory, accept, reject or edit the existing transla-
tion and update the memory as they went along, adding their own translations
for the source-language segments for which no matches existed. Over time, the
translation memories grew extremely large. Some companies who were early
adopters of the technology built up translation memories containing hundreds
of thousands and then millions of translation units, that is source-language seg-
ments aligned with their target-language segments. Example (12) shows a simple
30
2 Human and machine translation
translation unit based on a headline (in English and German) taken from a trans-
lation memory consisting of data from the website of the European Parliament.
It is presented in a format known as tmx (for “translation memory exchange”).
The tags <tu> and </tu> open and close the translation unit, the tags <tuv> and
</tuv> open and close each variant within the translation unit,5 and the tags
<seg> and </seg> open and close the segment or text string in that language.
(12) <tu>
<tuv xml:lang=“EN">
<seg>A common blacklist for unsafe airlines</seg>
</tuv>
<tuv xml:lang=“DE">
<seg>Unsichere Luftfahrtunternehmen kommen auf eine schwarze
Liste</seg>
</tuv>
</tu>
31
Dorothy Kenny
were extracted from the web and aligned with each other to create the multilin-
gual Europarl Corpus (Koehn 2005), which in turn gave a significant boost to
machine translation research. Aligned parallel corpora do not have to be in tmx
format. Often they take the form of files with thousands (or even millions) of
lines, each line occupied by a single sentence, whose position in the file matches
exactly that of its translation in another file in a given target language, so line x in
the target language file contains the translation of line x in the source language
file.
32
2 Human and machine translation
engines like Google Search or Microsoft Bing, for example, machine translation
can be used to expand a search and then to translate relevant foreign-language
web pages back into the user’s language.
But it’s not all about web pages. Machine translation is also used in combina-
tion with technologies like automatic speech recognition and speech synthesis,
or optical character recognition and digital image processing, allowing users to
have spoken conversations in two or more languages, or read road signs writ-
ten in unfamiliar writing systems, often using an app installed on their mobile
phones. In some cases, these apps now even work offline and users can justifi-
ably claim to be carrying a machine translation system in their pocket. Machine
translation is also increasingly used in areas previously considered beyond the
capacity of the technology, for example in audio-visual translation, to translate
the subtitles of foreign-language movies and TV series into the language of a
new market. Indeed, subscription video streaming services thrive on a model
that brings the so-called long tail of lesser-known titles to a new audience, and
many of these titles are lesser-known partly because they were originally made
in a foreign language. Audio-visual content is thus becoming just the latest in a
long line of commercial products whose markets can be expanded through ma-
chine translation. In the seventy or so years since its inception, machine transla-
tion has thus moved from being the preserve of governments and international
organizations to being a mass consumer good.
Despite the undoubted usefulness of machine translation in the kind of scenar-
ios addressed above and its capacity to do good in other, for example, humanitar-
ian settings (Nurminen & Koponen 2020), it comes with some health warnings.
First, just like human translators, machine translation systems can make mis-
takes. Errors might range from the amusing but trivial to the extremely serious
(for example in healthcare, news translation or international diplomacy). Whole
branches of research are thus devoted to estimating the quality that given ma-
chine translation systems are likely to produce, evaluating particular outputs, de-
signing ways to correct errors by post-editing machine translation output or help-
ing the machine produce better output in the first place, usually by pre-editing
source texts to make them easier to translate. These areas are discussed in detail
in Chapters 3 to 5 of this book. Machine translation also raises a surprising num-
ber of moral and legal issues, as addressed by Moorkens (2022 [this volume])
on ethics, and to a lesser extent by Carré et al. (2022 [this volume]) on machine
translation for language learners.
Many casual users of machine translation may feel that they do not need to
know much about any of these areas to get what they need from the technology:
if you are simply using machine translation to get the gist of a text, to understand
33
Dorothy Kenny
the basic contents of a web page, for example, then this might be true. Such uses,
which often fall under the heading of machine translation for assimilation, gener-
ally involve low-stakes, private use of the translated text in question, with little
risk of reputational or other damage. If, however, you want to use machine trans-
lation for dissemination, for example to publish your blog in a second language,
or to advertise your business, then it is wise to understand the risks involved and
even to take measures to mitigate them. The ability to do so is a component of
what is now known as machine translation literacy (Bowker & Ciro 2019). Other
components include having a basic understanding of how machine translation
actually works, and of the wider societal, economic and environmental implica-
tions of its use. While this might seem like esoteric knowledge, it turns out to be
highly transferable, as contemporary machine translation is based on the same
principles as a whole host of other technologies that are contributing to profound
changes in many aspects of contemporary life, and especially how we work. In
short, machine translation is now, for the most part, an application of machine
learning, and more specifically of deep learning. These concepts are explained
briefly below, and treated in greater depth by Pérez-Ortiz et al. (2022 [this vol-
ume]) on how neural machine translation works. If you are a translation student,
a professional translator, or are employed in some other capacity in the trans-
lation industry, then you are probably strongly motivated to learn about what
happens “under the hood” in machine translation systems. You are probably also
interested in how you can get the best out of the technology, by customizing it
for your needs. This is addressed in Ramírez-Sánchez (2022 [this volume]). The
following paragraphs, on the other hand, should be read by anyone who is cu-
rious about how machine translation can be said to be the linguist’s entrée into
the wonderful world of machine learning.
34
2 Human and machine translation
7
Although RBMT has fallen out of favour generally, at the time of writing, it is still used in
a small number of systems, especially for translation between very closely-related languages.
See, for example, Apertium (Forcada et al. 2011).
35
Dorothy Kenny
English Probability
a me piace I like 0.78
a me piace I should like to 0.11
a me piace I admire 0.11
8
A statistical model is a mathematical representation of observed data.
9
The example is greatly simplified, as it shows only sensible Italian-English pairings. In reality,
an SMT system would learn a translation model that contains lots of nonsensical pairings,
most of which would, however, be assigned very low probabilities. It would also reserve some
probability mass for previously unseen pairings.
36
2 Human and machine translation
37
Dorothy Kenny
38
2 Human and machine translation
as
39
Dorothy Kenny
40
2 Human and machine translation
41
Dorothy Kenny
more sense if we say that vectors are quite good at representing relationships
between words. The vector [1.20, 2.80, 5.50], for example, could be the vector for
pear. It differs from the vector for apple in just the last number. If we see the
numbers in the vector as representing dimensions in an imaginary three dimen-
sional space, this would make the words apple and pear very close to each other.
And presumably they would both be far from less related words, like helicopter
or very. Vectors have other interesting properties that make them particularly
attractive to computer scientists. You can add a vector to another vector, for ex-
ample, or multiply them and so on. Try doing that with the words themselves,
or with drawings of apples and pears!
So how did our vectors for apple and pear end up so suspiciously similar in
the above example? The truth is, we just made them up. In a real NMT scenario,
we would get a computer program to learn suitable vectors for all instances of all
words in our corpus directly from that corpus. (Remember, in machine learning,
the computer program has to work these things out for itself, with or without
human supervision.) The vector-based representations of words that the machine
learns are called word embeddings. The reason why embeddings for related words
end up looking similar to each other is that they are built up on the basis of where
particular words are found in the training data. If it turns out that two words tend
to keep turning up in the same or similar co-texts – both apple and pear occur
very regularly before the word tree for example; both appear regularly after peel,
slice and dice – then they will end up with similar embeddings.
Word embeddings are not built in one go, but rather in successive layers, as
described in Pérez-Ortiz et al. (2022 [this volume]). An artificial neural network
that has multiple layers sandwiched between its external layers is known as a
deep neural network.
Deep learning, in turn, is simply the branch of machine learning that uses mul-
tiple layers to build representations. In a deep neural network, the external layers
correspond to inputs and outputs of the network and are visible to the human
analyst. The intermediary, or hidden, layers have traditionally been less open to
scrutiny, however, giving deep learning a reputation for opacity, and encourag-
ing some commentators to misleadingly use the word “magic” to describe the
internal workings of deep neural networks. The mystique of NMT is added to
when big tech companies report on their successes in building multilingual trans-
lation models, sometimes involving hundreds of languages, and which can cope
with translation between languages for which there was no “direct” bilingual
training data.19 Researchers in AI have not been oblivious to problems caused by
19
See https://ai.googleblog.com/2016/11/zero-shot-translation-with-googles.html and https://
about.fb.com/news/2020/10/first-multilingual-machine-translation-model.
42
2 Human and machine translation
perceived opacity, however, and in the areas known as explainable AI (XAI) and
interpretable AI, efforts are now being made to open up the “black box” of deep
learning, so that its inner workings can be more easily understood by users, ex-
planations can be provided for particular outputs and systems can be improved
(see, for example, Vashishth et al. (2019)).
43
Dorothy Kenny
Other problems have less to do with the translations that NMT systems output
and more to do with wider environmental and societal concerns: NMT systems
take much longer and much more computing power to train than their predeces-
sors and use up vast quantities of energy in the process. They usually require
dedicated, expensive hardware in the form of graphical processing units. They
also need massive quantities of training data, which are not available for every
language pair.
Improvements in the technology have also led some people to question the
wisdom of learning foreign languages: if a machine can translate anything any-
one else says or writes in a foreign language into your language, why go to all the
trouble of learning their language? Such arguments are based on a very limited
understanding of the benefits of second or foreign language learning, however,
and ignore the fact that machine translation is viable for only a small number
of the world’s languages. They also tend to see machine translation as being in
competition with language learning, rather than possibly being an aid in the pro-
cess. Chapters 1, 6 and 9 of this book have more to say on the broader ethical and
societal issues raised by the use of machine translation in language learning and
other aspects of our lives.
20
We use Google Translate here simply because it is probably the most familiar machine trans-
lation service. All Big Tech companies offer machine translation “solutions” of one kind or
another, as do a whole host of specialist machine translation providers.
44
2 Human and machine translation
• a single system may output different translations for the same input de-
pending on the co-text;
45
Dorothy Kenny
(16) A little birdie tells me that you are married. (DeepL UK)
Google Translate, on the other hand, outputs the inappropriately literal trans-
lation in (17).
(17) My little finger tells me you’re married. (Google Translate)
Also at the time of writing, DeepL’s French-to-American English engine out-
puts (18) but if the sentence is changed by a single word as in (19), then DeepL’s
French-to-American English engine performs much better, as seen in (20).
(18) My little finger tells me that you are married. (DeepL US)
(19) Mon petit doigt me dit que tu es parti.
(20) A little birdie tells me that you’ve left. (DeepL US)
By the time the reader reads this, however, the outputs of both systems may
have changed completely, as models are retrained and users correct faulty out-
puts.
10 Conclusions
In one way, NMT is just the latest in a line of technologies designed to auto-
mate translation, albeit one that has risen to prominence remarkably quickly. Its
success could lead to policy makers and ordinary citizens questioning the value
of learning foreign languages or training human translators. But such positions
would ignore the fact that NMT still relies on human translations or at least trans-
lations validated by humans as training data. And because NMT, like other types
of machine translation, is not invincible, its outputs still need to be evaluated
and sometimes improved by people who can understand both source and target
texts. There is also a pressing need for machine translation literacy among even
casual users of the technology, so that they do not suffer unnecessarily because
of ignorance of how the technology works. Given the right conditions, NMT can
be a vital pillar in the promotion and maintenance of multilingualism, alongside
language learning and continued translation done or overseen by humans. The
rest of this book is dedicated to creating those conditions.
References
Baker, Mona & Gabriela Saldanha (eds.). 2020. The Routledge encyclopedia of trans-
lation studies. 3rd edition. London/New York: Routledge.
46
2 Human and machine translation
Bao, Guangsheng, Yue Zhang, Zhiyang Teng, Boxing Chen & Weihua Luo. 2021.
G-transformer for document-level machine translation. In Proceedings of the
59th annual meeting of the Association for Computational Linguistics and the
11th international joint conference on Natural Language Processing (volume 1:
long papers), 3442–3455. Association for Computational Linguistics. https://
aclanthology.org/2021.acl-long.267.
Bentivogli, Luisa, Arianna Bisazza, Mauro Cettolo & Marcello Federico. 2016.
Neural versus Phrase-Based Machine Translation quality: A case study. In
EMNLP 2016. arXiv:1608.04631v1.
Bowker, Lynne & Jairo Buitrago Ciro. 2019. Machine translation and global re-
search. Bingley: Emerald Publishing.
Carré, Alice, Dorothy Kenny, Caroline Rossi, Pilar Sánchez-Gijón & Olga Torres-
Hostench. 2022. Machine translation for language learners. In Dorothy Kenny
(ed.), Machine translation for everyone: Empowering users in the age of artificial
intelligence, 187–207. Berlin: Language Science Press. DOI: 10 . 5281 / zenodo .
6760024.
Caswell, Isaac. 2022. Google Translate learns 24 new languages. https : / / blog .
google/products/translate/24-new-languages/.
Forcada, M. L., M. Ginestí-Rosell, J. Nordfalk, J. O’Regan, S. Ortiz-Rojas, J. A.
Pérez-Ortiz, F. Sánchez-Martínez, G. Ramírez-Sánchez & F. M. Tyers. 2011.
Apertium: A free/open-source platform for rule-based machine translation.
Machine Translation 24(1). 1–18.
Forcada, Mikel. 2017. Making sense of neural translation. Translation Spaces 6(2).
291–309.
Goodfellow, Ian, Yoshua Bengio & Aaron Courville. 2016. Deep learning. Cam-
bridge, MA: MIT Press.
Hutchins, John (ed.). 2000. Early Years in Machine Translation. Memoirs and Bi-
ographies of Pioneers. Amsterdam/Philadephia: John Benjamins.
Jakobson, Roman. 1959. On linguistic aspects of translation. In Reuben A. Brower
(ed.), On Translation, 232–239. Cambridge, MA: Harvard University Press.
Johnson, Joseph. 2021. Worldwide digital population as of January 2021. statista.
com/statistics/617136/digital-population-worldwide/.
Joscelyne, A. 1998. AltaVista translates in real time. Language International 10(1).
6–7.
Koehn, Philipp. 2005. Europarl: A parallel corpus for Statistical Machine Trans-
lation. In Proceedings of Machine Translation Summit X, 79–86. https : / /
aclanthology.org/2005.mtsummit-papers.11.
Koehn, Philipp. 2010. Statistical Machine Translation. Cambridge: Cambridge Uni-
versity Press.
47
Dorothy Kenny
48
2 Human and machine translation
49
Chapter 3
How to choose a suitable neural
machine translation solution:
Evaluation of MT quality
Caroline Rossi
Université Grenoble-Alpes
Alice Carré
Université Grenoble-Alpes
Machine translation (MT) is evolving fast, and there is no one-size-fits-all solution.
In order to choose the right solution for a given project, users need to compare
and assess different possibilities. This is never easy, especially with MT outputs
that look increasingly good, thus making mistakes harder to spot. How can we
best define and assess the quality of a neural MT solution, so as to make the right
choices? The first step is certainly to define needs as precisely as possible. Hav-
ing defined a pragmatic view of quality, we introduce the key notions in human
and automatic evaluation of MT quality and outline how they can be applied by
translators.
1 Introduction
Beyond the hype about neural machine translation (NMT), users do notice that
machine-translated texts have been getting better. The main point of this chap-
ter is to show that even though machine translation (MT) outputs may appear to
be more fluent than before, they are not necessarily easier to deal with. Besides,
NMT outputs are likely to vary, and should be considered in context and accord-
ing to the needs of end-users. In what follows, we suggest definitions of quality
Caroline Rossi & Alice Carré. 2022. How to choose a suitable neural machine
translation solution: Evaluation of MT quality. In Dorothy Kenny (ed.), Ma-
chine translation for everyone: Empowering users in the age of artificial intelli-
gence, 51–79. Berlin: Language Science Press. DOI: 10.5281/zenodo.6759978
Caroline Rossi & Alice Carré
and measures that can be used to reach beyond the apparent ease and fluency of
NMT outputs.
The overarching question that this chapter seeks to answer is: how can NMT
solutions be assessed in a trustworthy and useful way? The answer may vary,
for example, according to use cases and text types. In what follows, we explain
the key issues with MT evaluation, with a view to helping users to choose an MT
engine that suits their specific needs.
52
3 How to choose a suitable neural machine translation solution
find relatively high tolerance for MT errors (Castilho & O’Brien 2016). Now con-
sider a completely different setting that also involves technical texts, but with
an added legal dimension: users of translated patents need precise and relevant
information, so tolerance for MT errors will be much lower. Looking at NMT
for the patent domain, Castilho et al. (2017: 113) have, for instance, evidenced
a tendency of NMT to omit elements from the source text, in a context where
a piece of information missing from the machine-translated text may have se-
rious consequences. In both cases, a pragmatic approach to quality assessment
would imply using measurable indicators of usefulness, such as user satisfac-
tion ratings, productivity increases in post-editing, or increased sales based on
machine-translated descriptions of products.
Overall, assessing translation quality is far from trivial, and several factors
come into play when evaluating a translation, whether it is done by humans or
machines. For a start, there is usually more than one valid solution in transla-
tion: the same source text can have several translations, all equally acceptable.
What is more, if the evaluation of a translation is entrusted to human evaluators,
the evaluation process will often be subjective: indeed, it is not uncommon for
evaluators to disagree on the level of quality of a given translation. Evaluations
based on what humans do with translations can be objective, however, when
they use productivity measures. Overall, in order to compensate for subjectivity,
it is essential to clearly define the objectives and indicators of each evaluation.
Another disadvantage of human evaluation is that it is also a time-consuming
and resource-intensive process. As an alternative to human evaluation, it is pos-
sible to use algorithms to carry out an automatic evaluation, which is certainly
cheaper and faster than human evaluation, but also sometimes less relevant, be-
cause it may not track usefulness in a particular application. Both types of eval-
uation thus have advantages and disadvantages; and your choice should depend
above all on your translation project and needs.
53
Caroline Rossi & Alice Carré
a
Lasagnes de grand-mère (French recipe): https://www.750g.com/lasagnes-r66998.htm
You would certainly be surprised at the result, since variation in the French
source text between the singular and plural of “pâte(s)” results in the appear-
ance of dough in a recipe that really only includes pasta. You might be cautious
and cunning enough to guess, but trusting the rather fluent MT output would
have resulted in baking a different dish, and the mistake was induced by just one
letter (a plural ending). Even though the machine-translated text is very fluent
and reasonably accurate on the whole, and would require only small changes
to improve it, we see that one serious issue is enough to make the translation
dysfunctional. The recipe’s relative simplicity, together with knowledge about a
common dish, could help readers work their way around this problem, but with
many other text types and specialized domains these elements won’t apply. What
is more, the evaluation method proposed here involves humans, ingredients and
a kitchen: it would be a very expensive test, and one that is hardly ever used.
Besides such misfires, most MT users are likely to encounter problems with
abstract notions and metaphorical expressions. In the example in Table 2, which
shows part of the blurb for a book published in French alongside its translation
into English by an NMT system, would an English-speaking reader be able to
guess that “a veritable pie in the sky” (English MT output for the French “vérita-
ble tarte à la crème”) meant a well-trodden path or prefabricated subject?
If you’re already used to dealing with MT, you probably recognise these mis-
takes, and a number of others. Experience makes a difference! And the more
54
3 How to choose a suitable neural machine translation solution
a
French source text: https://www.grasset.fr/livres/ministere-de-linjustice-9782246827504
fluent the MT output, the more caution is needed: recent studies have shown
that students’ correction rates were lower with NMT than other less fluent types
of MT (Yamada 2019). Getting used to the recurrent problems found in NMT out-
puts for a given language pair (and domain) will help you detect them and fix
them more efficiently.
To conclude, even though NMT quality has undeniably been getting better, it
is probably not easier to deal with than other types of MT, and MT is never a
simple recipe for success. Instead, you will need to pay attention to small mis-
takes hidden in a fluent MT output, and to carefully consider your needs before
deciding whether an MT solution is appropriate.
55
Caroline Rossi & Alice Carré
56
3 How to choose a suitable neural machine translation solution
57
Caroline Rossi & Alice Carré
scores for different MT engines or systems. They also typically have functions
designed for use by project managers in translation companies, as well as the
actual evaluators. Non-commercial tools such as PET (Aziz et al. 2012) are also
available to help in human evaluations of MT outputs, and are frequently used
by academic researchers.
Other familiar tools that can be used to support human evaluations include
spreadsheet programs. These allow manual input of scores into tables like that
suggested in Table 3. In-built functions can then be used to compute average
scores for the quality indicators you have used. A variety of free-to-use online
forms can also be used to conduct human evaluations.3 These are particularly
useful for conducting surveys, and can often automatically compute summary
and other statistics in the same way that spreadsheets do.
Table 3: Suggested spreadsheet for comparing MT solutions
3
Perhaps the best known example is Google Forms. See https://support.google.com/docs/
answer/6281888?hl=en&co=GENIE.Platform%3DDesktop
58
3 How to choose a suitable neural machine translation solution
• Omissions (words from the source text have been omitted from the target
text)
• Additions (words not in the source text have been added in the target text)
It might turn out that the small sample of MT output selected for evaluation is
not representative enough of each engine’s performance, and, ideally, the com-
parison should be repeated on different samples before choosing the best engine
or system. However, while large institutions may have the means of conducting
large-scale evaluation campaigns, smaller translation services and freelancers
may do better to turn to automatic metrics and measuring post-editing effort.
59
Caroline Rossi & Alice Carré
Important note
Water resistance of the transceiver (IP57: 1 meter / 30 minutes) is assured only
when the following conditions (sic):
While it is written for the general public, the source text is a technical text and
its translation would thus constitute a specialized translation task. It addresses
the domain of radio communication, and therefore has to respect the terminol-
ogy and phraseology of that domain, and its genre is that of a user manual, which
means in turn that it should follow the conventions of such documents. For exam-
ple, each concept should be referred to using one term only (i.e., synonyms are
not permitted), and each term should correspond to one concept only (a property
known as monosemy), instructions should be kept short and simple, and instruc-
tions should all be written following the same pattern. (For more on domains and
genres, see Kenny 2022 [this volume].) In our proposed translation project, the
5
The VX-450 series of Vertex Standard, now discontinued.
60
3 How to choose a suitable neural machine translation solution
target text will have to be translated into French and will have the same function
as the source text: it will be provided to customers along with the transceiver.
In what follows, we will consider this excerpt and the way it was translated by
three different MT tools. The first one (hereafter system A, the output of which
is called candidate A) is eTranslation, the EU’s MT tool.6 The second one (here-
after system B, which outputs candidate B) is Google Translate.7 The third one
(hereafter system D, which outputs candidate D) is DeepL Translator.8 At the
time of writing, these systems are freely accessible to the general public, with
one proviso: eTranslation requires would-be users to register and to belong to
one of three categories of users: SMEs, Public Service Officials and Public Sector
Service Providers.
Some of the more basic AEMs that we will present in this section can be com-
puted by hand. However, for the more complex metrics we will use MutNMT to
compute scores.9 While we are presenting an example to give readers an idea of
how these metrics work, we would like to make it clear here that the exact com-
putation of an AEM score varies depending on the particular implementation
details of each metric: if you use different tools to compute what seems to be the
same AEM (say, BLEU, for instance), you may well get different results.10 The dis-
crepancy in results may have its origins in the way the tool deals with quotation
marks, hyphens, breaking and non-breaking spaces, etc., before computation, in
the way it defines tokens (does it take into account apostrophes, hyphens, punc-
tuation or linguistic information such as lemmas or multiple-word units?), in its
sensitiveness to case, or in metric parametrization specifics (e.g., what order of n-
grams is used for the exact implementation?).11 In our example, we changed the
apostrophes in candidate D to those used in the reference translation. This way,
the different coding of smart and straight quotes will not interfere with the AEM
results, and we can focus on the translation output per se. Furthermore, when we
compute AEMs by hand for the purposes of explanation, we consider hyphens
and apostrophes as word “breaks”. This means that the total word count of our
reference translation (see Figure 3) is eight.
6
https://webgate.ec.europa.eu/etranslation/translateTextSnippet.html
7
https://translate.google.com/?hl=en
8
https://www.deepl.com/en/translator. Note that we are referring to DeepL as “system D” and
not “system C” in order to avoid confusion in cases where we use C to refer to a candidate
translation.
9
https://mutnmt.prompsit.com/index
10
Note, however, that there has recently been an effort to normalize and group reference imple-
mentations of AEMs in software such as Matt Post’s sacrebleu (Post, 2018).
11
The authors would like to thank Gema Ramírez-Sánchez for her explanations.
61
Caroline Rossi & Alice Carré
Figure 4 shows the source text, candidate translations and reference transla-
tion that we will consider in what follows.
Figure 4: Main example for this section – source text, reference trans-
lation and candidate translations
What can a human evaluator say about these examples? Firstly, the term “bat-
tery pack” should be translated by “batterie”, unless the customer has speci-
fied otherwise. The translation “bloc-piles” (candidate D) is plainly wrong: this
transceiver does not function on “piles”, which are electrochemical cells designed
to be used once and then discarded, but rather on a “batterie”, that is a recharge-
able pack of cells. In this case, the transceiver operates on a lithium-ion battery.
Talking of “bloc-batterie” (candidate A) is not intrinsically incorrect. Rather, it
is not idiomatic; it is a calque, i.e. an overly word-for-word translation, of the
English sentence. Secondly, the verbal form “is attached to” can just as well be
rendered by “est installée sur” or by “est fixée à”: this is a matter of personal pref-
erence. Thirdly, “transceiver”, which is a contraction of “transmitter-receiver”
should ideally be translated by “émetteur-récepteur”, as is the case in all trans-
lations shown in Figure 4. However, “radio” or even “appareil” (“device”) would
have worked just as well for the purposes of this translation project (for more on
translation and equivalence, see Kenny 2022 [this volume]). Now, let us take an
in-depth look at how AEMs would assess these candidate translations.
62
3 How to choose a suitable neural machine translation solution
2.5.1.1 𝑛-grams
𝑛-grams (see Kenny 2022 [this volume]) are normally understood in translation
as n-word sequences. In our example sentence, “battery” is a 1-gram or unigram,
“battery pack” is a 2-gram or bigram and “battery pack is” is a 3-gram or trigram.
Other orders of n-gram are simply called 4-gram, 5-gram, etc., making “battery
pack is attached” a 4-gram.
𝑛-grams are commonly used in language modelling, where, for example, a tri-
gram probability states the probability of seeing a word given that you have
already seen the two words before it.
When we discuss AEMs, n-grams are merely n-word sequences in the candi-
date translation that also occur in the reference translation. More recently, AEMs
have been proposed which consider sequences of characters instead of words. N -
grams are then understood as sequences of n characters, rather than sequences
of n words.
We will be using the notion of n-grams as n-word sequences when discussing
BLEU (see 2.5.4.), and of n-grams as n-character sequences when discussing ChrF3
(see 2.5.5).
63
Caroline Rossi & Alice Carré
Let us now work out the precision of each candidate. System A’s output has
five correct words out of a total of nine, which gives a precision of 0.56, or 56%.12
System B’s output has six correct words out of eight, so its precision score is 0.75,
or 75%. Finally, system D’s output has four correct words out of nine, which gives
a precision of 0.44, or 44%. According to this metric, system B’s output is better
than that of system A or system D.
Recall, in the same context, computes the ratio of correct words in the candi-
date to the total number of words in the reference:
64
3 How to choose a suitable neural machine translation solution
2.5.1.3 𝐹 -measure
The student in our example above could choose to prioritize precision over recall
and thus refuse to give any more answers after “Monday, Tuesday”, because they
do not want to risk giving a wrong answer. Alternatively, they might choose
to prioritize recall by blurting out tens of answers in the hope that enough of
them are actually correct. They thus might reply “Monday, Tuesday, Wednesday,
Thursday, Friday, Saturday, Sunday, January, February, March, April, May, June,
July, August, September, October, November, December”. Their recall would now
shoot up to 100% as they would have given seven out of seven correct answers
(for the days of the week), but their precision will plummet to under 37% as only
seven out of the nineteen answers in their reply are correct. From the teacher’s
point of view, neither strategy is ideal. What the teacher wants is for the student
to optimize both precision and recall as the same time. They need a score that
combines both. This is where the F-measure comes in.
In mathematical terms, the F-measure is the harmonic mean of precision and
recall. It is computed as follows:
precision ⋅ recall
𝐹 =2⋅ (3)
precision + recall
65
Caroline Rossi & Alice Carré
With the three metrics precision, recall and 𝐹 , the higher the score, the better
the MT output is deemed to be. However, these metrics work at the word level
and do not take word order into account.
66
3 How to choose a suitable neural machine translation solution
TER is a heuristic process, i.e. an iterative process, where the algorithm tries
to find the best solution (the minimal number of steps required to go from one
sequence to another) by testing successive hypotheses. To calculate the TER man-
ually, one can use a matrix. However, we are going to propose a shorter, if imper-
fect way,13 for the sake of explanation: let us compare each candidate translation,
and count the number of matches, shifts, substitutions, additions and deletions.
Remember, the number of matches will not go into the final calculus. As we
mentioned before, we consider hyphens and apostrophes as word “breaks”. A
tool that does not consider them as breaks would treat “l’émetteur-récepteur” as
one word rather than three and get a different result.
2.5.2.1 Candidate A
13
Note that our examples contain no shifts, so that TER equals WER here.
67
Caroline Rossi & Alice Carré
2.5.2.2 Candidate B
2.5.2.3 Candidate D
68
3 How to choose a suitable neural machine translation solution
Again, remember the number of matches will not go into the final calculus.
2.5.3.1 Candidate A
69
Caroline Rossi & Alice Carré
2.5.3.2 Candidate B
2.5.3.3 Candidate D
70
3 How to choose a suitable neural machine translation solution
Systems A’s output gets an HTER score of 38%, system B’s an HTER score of
0% and system D’s an HTER score of 50%. Remember, because TER and HTER
are error rates: the lower the value, the better the MT output is deemed to be.
Thus, according to this metric, the best output would be candidate B.
Table 11: TER and HTER scores for each candidate translation
Now, compare the TER and HTER scores for each candidate translation in our
example (Table 11): they all get a lower, i.e. better, HTER than TER. Our example
confirms that “the edit rate between a machine translation and its postedited ver-
sion is dramatically lower than between the machine translation and an indepen-
dently produced human reference translation” (Koehn 2020: 52). This difference
could be taken as a reminder of the dangers of under-post-editing, which can
happen when post-editors work under too much time pressure.14
71
Caroline Rossi & Alice Carré
Figure 9: candidate and reference translation for the sentence Ceci n’est
pas une pipe, showing one 4-gram overlap.
Because the BLEU score computes the ratio of n-grams in the candidate trans-
lation that also occur in the reference translation, it is a precision metric. Table 12
thus presents the precision scores (expressed as a ratio and a decimal fraction)
for each order of n-gram in our candidate translation.16
Table 12: Precision (from 1-grams to 4-grams) for the candidate trans-
lation ‘That is not a pipe’.
Metric
Precision (1-gram) 5/6 0.83
Precision (2-gram) 4/5 0.80
Precision (3-gram) 3/4 0.75
Precision (4-gram) 2/3 0.66
72
3 How to choose a suitable neural machine translation solution
of the sentence in Figure 9, however, as the candidate is the exact same length as
the reference.
Although this metric is often referred to as “the BLEU score”, there are so many
different parameters that go into computing BLEU scores (Post 2018) that it can
be very difficult for non-specialists to find out and understand how exactly a
given AEM tool computes it. What is important for translators who wish to use
this AEM to assess MT outputs is that the scores they get for different candidate
translations are consistently calculated: put simply, make sure you use the same
MT evaluation tool, that you understand the settings it uses, and, if some of them
are user-definable, that you use the same settings when comparing candidate
translations using that AEM. This way, you will get comparable scores.
By way of illustration, Table 13 shows the BLEU scores computed by two dif-
ferent calculators, those provided by MutNMT and Tilde, for the candidate trans-
lations in Figure 4.18
Table 13: Sentence-level BLEU scores for candidate translations A, B
and D using MutNMT and Tilde
According to these scores, system B’s output is better than system A’s. This is
consistent with our findings so far. But the actual values vary dramatically and
the user would need to investigate why this might be the case.
2.5.5 ChrF3
The ChrF score is an F-measure based on character n-grams. Therefore, it is
based both on precision and recall. Remember the formula for the F-measure:
precision ⋅ recall
𝐹 =2⋅ (13)
precision + recall
The formula for the ChrF score is:
ChrP ⋅ ChrR
ChrF𝛽 = (1 + 𝛽 2 ) ⋅ 2 (14)
𝛽 ⋅ ChrP + ChrR
18
https://mutnmt.prompsit.com/index; MutNMT uses the SacreBLEU algorithm (Post 2018).
Tilde’s “interactive bleu score evaluator” is available at https://www.letsmt.eu/Bleu.aspx.
73
Caroline Rossi & Alice Carré
where
• ChrP is the character 𝑛-gram precision, i.e. the number of correct char-
acter 𝑛-grams in the candidate translation divided by the total number of
𝑛-grams in the candidate translation,
• ChrR is the character 𝑛-gram recall, i.e. the number of correct character 𝑛-
grams in the candidate translation divided by the total number of character
𝑛-grams in the reference translation, and
The ChrF3 score, then, is a variant of ChrF where 𝛽 = 3, i.e. recall has three
times more weight than precision. According to Popović (2015), experiments have
shown that ChrF, and especially ChrF3, represent promising metrics for auto-
matic evaluation of MT output.
As with BLEU, we will not calculate ChrF3 scores here. However, it is interest-
ing to compare the scores given by an AEM tool.
Table 14 shows the ChrF3 scores of candidates A, B and D, as computed by
MutNMT.
Table 14: ChrF3 score for candidate translations A, B and D
According to these scores, system B’s output is once again rated better than
system A’s and system C’s.
74
3 How to choose a suitable neural machine translation solution
metric (e.g. TER) while 1 could be the best score and 0 the worst for another (e.g.
BLEU).
We have also seen that caution should be exercised when comparing metrics
computed with different tools: indeed, the algorithms behind apparently similar
AEMs might differ in ways the non-specialist user is unaware of.
Finally, it should be said that to make the most of these different metrics, users
will need to reflect on what the scores mean for their purposes and get used to
them. Comparing different AEMs and combining them with human evaluation
will help, even though you might be faced with differences (Doherty 2017: 134).
Human evaluation can take into account context in a better way, provided that
it is not done with segments presented in a random order: evaluators can then
look for errors in pronouns, for instance, while AEMs mostly operate at word and
sentence level. One interesting way to combine measures is to use both HTER,
which gives you a measure of technical post-editing effort, and a temporal mea-
sure, which tells you how long post-editing took.20 O’Brien (2022 [this volume])
gives an overview of measures of post-editing effort.
no. of types
type token ratio = (15)
no. of tokens
or
no. of types
type token ratio = ⋅ 100 (16)
no. of tokens
20
While averaging them into a single score might not be very telling, looking for correlations
could help to identify the most serious problems.
75
Caroline Rossi & Alice Carré
The first method gives results ranging from 0 to 1, while the second gives
percentages, ranging from 0% to 100%. The higher the type-token ratio, the more
varied the vocabulary in the text under scrutiny.
However, several warnings have to be issued regarding this metric. Firstly,
TTR is highly sensitive to text-length. Indeed, the longer a text is, the more often
such words as determiners and articles will be repeated. Moreover, because texts,
especially specialized texts, have a thematic unity, terms are repeated. Therefore,
the longer the text segment under consideration, the lower the TTR. Because of
this sensitivity of TTR to text length, TTR may have to be standardized across
blocks of a given number of tokens (e.g. 1,000 tokens) depending on the task at
hand. Standardizing in this way would allow you to compare the TTR of your
machine translated corpus with that of a corpus of different length in the same
(target) language.
Secondly, while lemmatization does not matter when comparing texts in the
same language, TTR has to be lemmatized when comparing two or more lan-
guages as some languages have richer inflectional morphology than others and
thus would be expected to have more lexical variety, simply because they have,
for example, more forms for any given verb. If you are simply using standardized
TTRs to compare the lexical variety of machine translated texts with that of other
texts in the same language however, then lemmatization will not be necessary.
Lastly, bear in mind that a higher TTR, that is, one that indicates more lexical
variety, does not necessarily equate with higher complexity. For example, con-
sider the sentences “The girl saw a fire.” and “The lexicographer observed the
conflagration.” Both sentences are made up of five words (tokens), but while the
former has five types, the latter has only four (because the token “the” occurs
twice). The first sentence thus has a TTR of 1 or 100%, while the second has a
TTR of 0.8 or 80%. But in spite of being less varied than the first sentence, the
second sentence is more complex.21
As already indicated, segment-level comparisons of TTRs might not make
much sense, but at text or corpus level, same-language comparisons of standard-
ized TTRs could give us valuable information, depending on the kind of text we
are dealing with. This chapter has focused on specialized translation. But there
are different kinds of specialized translation, which follow different conventions.
Contrary to literary or marketing translation, where higher lexical variety (and
thus a higher TTR) could be associated with higher quality and make the reading
all the more pleasant for the user of the target text, technical translation often has
to comply with certain conventions that tend to decrease the lexical variety of
21
The authors would like to thank Dorothy Kenny for this comment.
76
3 How to choose a suitable neural machine translation solution
texts while making them easier to use for the end user. The main example used
in this chapter comes from a user manual, which means in turn that it should
follow such conventions as using a single term for a single concept, with no vari-
ation, and that instructions should as far as possible be written following the
same pattern. For example, if “transceiver” was translated at times by “émetteur-
récepteur”, and at others by “radio” and by “appareil”, it would lead to a higher
TTR, while introducing uncertainty for the end user.
This leads us to conclude this section with a second word of caution.
3 Conclusion
In this chapter we have sought to illustrate what a pragmatic approach to MT
evaluation implies for specialized translators or trainees. This approach has been
called pragmatic because it considers evaluation as a means to an end, and im-
plies choosing among different methods depending on the situation, often using
a combination of human and automatic evaluation.
While the comparison of MT outputs has been used as a method through-
out this chapter, it is worth noting that specialized translators are rarely given
a choice about what evaluation metric to use in current translation scenarios.
Rather, they often need to make a quick judgement on whether a given MT so-
lution is fit for purpose, or provide a general assessment of its quality.
We have thus explained how evaluations of MT outputs might be conducted,
using a combination of human and automatic evaluation metrics. We have ex-
plained the latter in great detail because we believe that, for all their limitations,
they can be put to good use if understood properly, and combined with human
evaluation.
References
Aziz, Wilker, Sheila Castilho & Lucia Specia. 2012. PET: a tool for post-editing
and assessing machine translation. In Proceedings of the eight international con-
ference on language resources and evaluation (LREC’12), 3982–3987.
Castilho, Sheila. 2020. On the same page? Comparing inter-annotator agreement
in sentence and document level human machine translation evaluation. In Pro-
ceedings of the 5th conference on machine translation (WMT), 1150–1159. https:
//aclanthology.org/2020.wmt-1.137.pdf.
77
Caroline Rossi & Alice Carré
Castilho, Sheila, Stephen Doherty, Federico Gaspari & Joss Moorkens. 2018. Ap-
proaches to human and machine translation quality assessment. In Federico
Gaspari Joss Moorkens Sheila Castilho & Stephen Doherty (eds.), Translation
quality assessment: From principles to practice, 9–38. Cham: Springer.
Castilho, Sheila, Joss Moorkens, Federico Gaspari, Iacer Calixto, John Tinsley &
Andy Way. 2017. Is neural machine translation the new state of the art? The
Prague Bulletin of Mathematical Linguistics 108. 109–120. DOI: 10.1515/pralin-
2017-0013.
Castilho, Sheila & Sharon O’Brien. 2016. Evaluating the impact of light post-
editing on usability. In 10th international conference on language resources and
evaluation (LREC), 310–316. May 2016, Portorož, Slovenia. ELRA.
Doherty, Stephen. 2017. Issues in human and automatic translation quality assess-
ment. In Dorothy Kenny (ed.), Human issues in translation technology, 131–148.
London: Routledge.
Drugan, Joanna. 2013. Quality in professional translation: Assessment and improve-
ment. London: Bloomsbury.
Gouadec, Daniel. 2010. Quality in translation. In Handbook of translation studies.
Volume 1, 270–275. John Benjamins Publishing Company.
Grbić, Nadja. 2008. Constructing interpreting quality. Interpreting 10(2). 232–257.
House, Juliane. 2015. Translation quality assessment: Past and present. London:
Routledge.
Kenny, Dorothy. 2022. Human and machine translation. In Dorothy Kenny (ed.),
Machine translation for everyone: Empowering users in the age of artificial intel-
ligence, 23–49. Berlin: Language Science Press. DOI: 10.5281/zenodo.6759976.
Koehn, Philipp. 2010. Statistical Machine Translation. Cambridge: Cambridge Uni-
versity Press.
Koehn, Philipp. 2020. Neural Machine Translation. Cambridge: Cambridge Uni-
versity Press.
Mariana, Valerie, Troy Cox & Alan Melby. 2015. The multidimensional quality
metric (MQM) framework: A new framework for translation quality assess-
ment. The Journal of Specialised Translation 23. 137–161.
Moorkens, Joss. 2018. What to expect from neural machine translation: a practical
in-class translation evaluation exercise. The Interpreter and Translator Trainer
12(4). 375–387.
Moorkens, Joss. 2022. Ethics and machine translation. In Dorothy Kenny (ed.),
Machine translation for everyone: Empowering users in the age of artificial intel-
ligence, 121–140. Berlin: Language Science Press. DOI: 10.5281/zenodo.6759984.
78
3 How to choose a suitable neural machine translation solution
O’Brien, Sharon. 2022. How to deal with errors in machine translation: Post-
editing. In Dorothy Kenny (ed.), Machine translation for everyone: Empower-
ing users in the age of artificial intelligence, 105–120. Berlin: Language Science
Press. DOI: 10.5281/zenodo.6759982.
Popović, Maja. 2015. Chrf: Character n-gram f-score for automatic MT evaluation.
In Proceedings of the tenth workshop on statistical machine translation, 392–395.
Association for Computational Linguistics. 10.18653/v1/W15-3049.
Post, Matt. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the
third conference on machine translation (WMT), Volume 1: research papers, 186–
191. Association for Computational Linguistics. DOI: 10.18653/v1/W18-6319.
Qin, Ying & Lucia Specia. 2015. Truly exploring multiple references for machine
translation evaluation. In Proceedings of the 18th annual conference of the Eu-
ropean Association for Machine Translation, 113–120. https://aclanthology.org/
W15-4915/.
Snover, Matthew, Bonnie Dorr, Rich Schwartz, Linnea Micciulla & John Makhoul.
2006. A study of translation edit rate with targeted human annotation. In Pro-
ceedings of the 7th conference of the Association for Machine Translation in the
Americas: Technical papers, 223–231. Cambridge, Massachusetts: Association
for Machine Translation in the Americas. https://aclanthology.org/2006.amta-
papers.25/.
Toral, Antonio. 2019. Post-editese: An exacerbated translationese. In Proceedings
of machine translation summit XVII, 273–281. EAMT. https://www.aclweb.org/
anthology/W19-6627/.
Williamson, Graham. 2009. Type-token ratio. Last retrieved 5 Dec. 2020. https :
//www.sltinfo.com/wp-content/uploads/2014/01/type-token-ratio.pdf.
Yamada, Masaru. 2019. The impact of Google neural machine translation on post-
editing by student translators. The Journal of Specialised Translation 31(2019).
87–106.
79
Chapter 4
Selecting and preparing texts for
machine translation: Pre-editing and
writing for a global audience
Pilar Sánchez-Gijón
Universitat Autònoma de Barcelona
Dorothy Kenny
Dublin City University
Neural machine translation (NMT) is providing more and more fluent translations
with fewer errors than previous technologies. Consequently, NMT is becoming a
real tool for speeding up translation in many language pairs. However, obtaining
the best raw MT output possible in each of the target languages and making texts
suitable for each of the target audiences depends not only on the quality of the MT
system but also on the appropriateness of the source text. This chapter deals with
the concept of pre-editing, the editing of source texts to make them more suitable
for both machine translation and a global target audience.
1 Introduction
Put simply, pre-editing involves rewriting parts of source texts in a way that is
supposed to ensure better quality outputs when those texts are translated by ma-
chine.1 It may involve applying a formal set of rules, sometimes called controlled
1
As discussed in Rossi & Carré (2022 [this volume]), quality is not a fixed concept; rather, judg-
ments about quality depend on a whole host of factors, including the intended purpose of a
translation. For a detailed discussion of this highly mutable concept, see Drugan (2013) and
Castilho et al. (2018).
Pilar Sánchez-Gijón & Dorothy Kenny. 2022. Selecting and preparing texts
for machine translation: Pre-editing and writing for a global audience. In
Dorothy Kenny (ed.), Machine translation for everyone: Empowering users in
the age of artificial intelligence, 81–103. Berlin: Language Science Press. DOI:
10.5281/zenodo.6759980
Pilar Sánchez-Gijón & Dorothy Kenny
language rules, which stipulate the specific words or structures that are allowed
in a text, and prohibit others (see, for example, O’Brien 2003). Alternatively, it
can involve applying a short list of simple “fixes” to a text, to correct wrong
spellings, or impose standard punctuation, for example. Depending on the con-
text, it might involve both of the above. Whatever the case, its main purpose, as
understood here, is to improve the chances of getting a better quality target text
once the source text has been machine translated. In cases where a source text is
to be translated into multiple target languages, the benefits of pre-editing should,
in theory, be observed over and over again in each of the target language texts.
It is thus traditionally recommended in multilingual translation workflows.
Another way to ensure that a text is translatable is to write it that way in
the first place. Writers whose work will ultimately be translated into multiple
languages are thus often asked to write with a global audience in mind. As well
as applying principles of “clear writing”, they are asked, for example, to avoid
references that may not be easily understood in cultures other than their own.
This applies also to writers whose work will be read in the original language by
international readers who are not native speakers of that language.
Given their similar aims, it is not surprising that there is often overlap be-
tween pre-editing rules, controlled languages and guidelines for clear writing or
writing for global audiences. In this chapter, we give an overview of the kind
of guidance commonly encountered in such sources, without attempting to be
exhaustive. The reader must also remember that such guidance is always lan-
guage specific: advice about the use of tense forms, for example, applies only
to languages that have grammatical tense. Many do not. Guidance can also be
language-pair specific or specific to a particular machine translation (MT) type
or engine. A construction that caused problems in rule-based MT (RBMT) may
no longer be an issue in neural MT (NMT), or it might be associated with errors
in a neural engine trained on legal texts but not one trained on medical texts.
In the case of writing with MT in mind, what turns out to be useful advice thus
depends heavily on the context.
The advent of NMT, in particular, has made us rethink the usefulness of ad-
vice on pre-editing and controlled writing (see Marzouk & Hansen-Schirra 2019
and §2 below), but for much of the history of MT, pre-editing helped ensure the
success of the technology. A good knowledge of MT made it possible to predict
those aspects of the source language or the source text that would likely gener-
ate errors in translations produced by a given MT system, whether rule-based
or statistical. However, one of the aspects that characterize NMT is precisely its
lack of systematic error: it can be difficult to predict with any certainty what
type of error will occur, and so attempting to pre-empt particular errors may
82
4 Selecting and preparing texts for machine translation
83
Pilar Sánchez-Gijón & Dorothy Kenny
Other studies suggest that pre-editing is simply not an effective strategy with
NMT systems. Marzouk & Hansen-Schirra (2019), for example, found that pre-
edits improved the performance of an RBMT, an SMT, and a hybrid MT system, in
the context of German-to-English technical translation, but they did not improve
the performance of the NMT system they tested.2 Among the few studies that
are enthusiastic about pre-editing in the context of NMT is that by Hiraoka &
Yamada (2019). They applied just three pre-editing rules to Japanese TED Talk
subtitles, namely:
84
4 Selecting and preparing texts for machine translation
Given the lack of clear research evidence to support the use of pre-editing in
NMT workflows, industrial users of NMT are best advised to test the effects of
pre-edits carefully before promoting their use in production environments. As
indicated in the Introduction to this chapter, they may find that certain edits are
useful only for particular language pairs, given particular genres and particular
NMT engines and the training data they are based on.
85
Pilar Sánchez-Gijón & Dorothy Kenny
or the use of pronouns like it probably still have some potential for appli-
cation, while rules of a syntactic nature may be unnecessary.
The straightforward nature of technical specifications contrasts with mar-
keting and legal materials associated with the same product, which might
need adaptation to make them more acceptable to potential buyers, or com-
pliant with the target legal framework. Indeed, legal translation provides
one of the best examples of a domain where it is sometimes necessary to
“rethink” a text completely in translation, so that it can be accommodated
by a new conceptual system.
For more on the domain- and genre-specific nature of translation, see Olo-
han (2015) and Šarcevic (1997).
Low-risk internal documentation: These are texts which have very low visibility
and where the consequences of less-than-optimal translation are not seri-
ous (see Canfora & Ottmann 2020 and Moorkens (2022 [this volume]) for
more detailed discussions of risk in MT use.) They may even be limited to
use within the user’s or client’s company. A priori, considerations such as
naturalness or fluency in the target language are less relevant than would
otherwise be the case (although NMT generally produces quite fluent out-
put anyway), but companies may still wish to control lexical selection and
lexical variability.
Low-risk external documentation: This refers to texts that are consulted only oc-
casionally or sporadically, or texts that are used as a help database or simi-
lar, and that are often not produced by the client, but by the community of
users of its service or product. In many such cases, the MT provider may
explicitly deny liability for any losses caused by faulty translations.
MT is not usually recommended for texts of a more visible nature whose
purpose is not just to inform or give instructions but also to be “appella-
tive”, that is, to arouse a particular interest in the reader, for example, in
a certain brand, or to elicit a certain behaviour. In other words, the more
informative a text is, the more it limits itself to the literalness of its mes-
sage, the less implicit information it contains and the less it appeals to
references linked to the reader’s culture or social reality, the greater the
expected success of MT.
86
4 Selecting and preparing texts for machine translation
87
Pilar Sánchez-Gijón & Dorothy Kenny
similar effect on the reader of the target text as the source text did on its reader,
to the extent that this is possible. It is a question of making a text available to a
global audience and attempting to have the same effect on readers in each target
language.
Whether the text is to be translated with an MT system or not, from a com-
munication perspective, for years it has been considered advisable to have the
translation already in mind during the drafting phase of the source text. In fact,
the preparation of documentation for translation forms part of their training for
technical writers (Maylath 1997).
Over the last 50 years, the translation industry, and all related interested parties
from translators to major technology developers and distributors, have learned
that the best translation strategy requires appropriate internationalization of the
product (Fry 2003: 14). The best way to adapt a product to any other region is to
exclude those aspects which are unique to the source text region where it is being
designed and developed. In this way, any digital product can be localized and
used in the target language and on any device or platform, without its original
design having to be modified. Something similar also appears to have happened
with texts designed to be published in different languages.
Both language service providers and developers of localized digital products
have found that pre-editing source texts is the key to their global communica-
tion strategy. Many language service companies advertise on their websites that
good multilingual communication strategies begin with developing an appropri-
ate source text. Digital product developers have likewise discovered that the best
strategy for communication with their users and potential customers is based on
keeping a global user in mind. This strategy is embodied in a set of guidelines
that should be taken into account when drawing up the contents of any text.
Google’s documentation style guide, for example, features a basic “writing for
a global audience” principle, and sets out a series of guidelines in English that
facilitate the translation of documentation into any target language. These in-
clude, among others, general dos and don’ts, such as use present tense, provide
context, avoid negative constructions when possible, write short sentences, use
clear, precise, and unambiguous language, be consistent and inclusive (Google
2020).
Today’s translation technologies make it possible to combine the use of com-
puter-aided translation tools like translation memory tools (see Kenny 2022: §4
[this volume]) and MT systems. So, the limitations of MT in this sense are not
technological, but rather determined by the quality of the raw MT output (is it
error-free?) and appropriateness for the target communicative context (register,
88
4 Selecting and preparing texts for machine translation
tone, genre conventions, and any other issues relevant for the translation to fulfil
its communicative function).
In the case of textual genres that formally follow very rigorous conventions
and essentially have an informative or instructive communicative function (for
example, technical documentation, or similar), MT produced by a quality transla-
tion engine can give good or very good results, depending on the language pair
and other factors. In these cases, “pre-editing” can be limited to spellchecking the
source text, since these genres do not usually involve stylistic or referential fea-
tures (see below) that take them outside the realm of standard and non-complex
source text use.
However, genres which have a mixture of more than one communicative func-
tion, for example the recently popular “unboxing” videos for technical gadgets,
which are often both instructive (informative) and entertaining (appellative and
expressive), are not so simple to deal with using MT.
Texts belonging to yet other genres may contain references to the social, eco-
nomic or cultural life of their source communities that allow source text readers
to identify with the text, but may not have the same effect on the target text
language reader (see §6.4 on referential elements). Other possible obstacles for
some MT engines include rhetorical and stylistic devices (contractions, abbrevi-
ations, neologisms, incomplete sentences, etc.), that shape the source text, and
with which the source text readers can identify.
NMT allows users to obtain translations with fewer and fewer errors of fluency
or adequacy. It enables translations to be completed very quickly. Moreover, it
seems to achieve excellent results when translating many different text genres.
But a text written grammatically in the target language and without translation
errors may still not be an appropriate translation. Pre-editing makes it possible
to ensure the appropriateness of the translation with a global audience in mind.
Currently, this phase is seldom used in the translation industry. In the past, some
global companies using SMT or RBMT pre-edited their original texts to avoid
recurring translation errors using their own systems. With NMT, pre-editing may
become widespread in the industry as part of a strategy that not only avoids
translation errors but also contributes to making the raw MT output appropriate
to the contexts of use of the target translation.
89
Pilar Sánchez-Gijón & Dorothy Kenny
6 Pre-editing guidelines
6.1 Opening remarks
Pre-editing is based on applying a series of specific strategies to improve MT
results when preparing content for a global audience or in controlled domains.
Pre-editing helps to ensure clear communication in controlled domains target-
ing global audiences. In this context, the predominant textual type is informa-
tional, where there is no creative or aesthetic use of language but a literal and
unambiguous use with the intention of either informing or instructing the text’s
recipient. The following are the most common guidelines used in communica-
tion for a global audience, and are the basis for pre-editing strategies. The aim
of most of these guidelines is to increase MT effectiveness in producing gram-
matically correct translations that reproduce the source text “message” and also
to obtain translations that are appropriate to the communicative situation of the
receiver according to the text function and the context in which it is used. These
guidelines can be grouped into three different categories:
1. Lexical choice
3. Referential elements
Whatever the case, the success of pre-editing will be determined by two con-
siderations. First, the function of the (source and target) text: the greater the
predominance of the informative or instructive function over the phatic or aes-
thetic functions, the more sense it makes to pre-edit the original text. Second,
the kind of errors in the raw MT output that the chosen MT system provides and
that should be avoided or minimized by pre-editing the source text.
Pre-editing has two objectives: to prepare the original text so that the most
error-free possible raw MT output can be obtained, and also to prepare the orig-
inal text so that its translation through MT is suitable for a global audience. The
pre-editing guidelines presented in this section respond to these two objectives.
90
4 Selecting and preparing texts for machine translation
time possible. An appropriate choice of words in the source text can contribute
not only to avoiding translation errors, but also to complying more effectively
with the linguistic uses in accordance with the function of the text and the reason
for its publication. Table 1 contains typical guidelines related to the lexicon.
Table 1: Typical lexical pre-editing guidelines
Guideline Explanation
Avoid lexical shifts in register Avoid words that can change the style of
the text or the way it addresses the
receiver.
This facilitates understanding the text and
normalizes the way the receiver is
addressed.
Avoid uncommon abbreviations Only use commonly-found abbreviations.
Avoid abbreviated or reduced forms that
cannot be easily translated from their
immediate context.
Avoid unnecessary words Avoid unnecessary words for transmitting
the information required.
Using more words than needed means
that the NMT system handles more word
combinations and has more opportunities
to propose an inappropriate or erroneous
translation.
Be consistent Use terminology in a consistent and
coherent way.
Avoid introducing unnecessary word
variation (that is, avoid synonymy).
91
Pilar Sánchez-Gijón & Dorothy Kenny
the case of NMT, the options adopted in the source text activate or inhibit transla-
tion options. An unnecessarily complex and ambiguous text structure that allows
objectively different interpretations increases the possibility of the NMT system
proposing correct translations of microstructural elements (terminology, phrases
or syntactic units) which, when joined together in the same text, generate texts
that are internally incoherent, suggest a different meaning to the source text, or
are simply incomprehensible.
Table 2 gives pre-editing guidelines regarding the style and structure of the
text. Most of them are not only aimed at optimizing the use of NMT systems,
but also at the success of the translated text in terms of comprehensibility and
meaning.
Most of the guidelines listed in Table 2 are aimed at producing a simple text
that can be easily assimilated by the reader of the source text. In the case of NMT
engines trained with data sets already translated under these criteria, source text
pre-editing helps to obtain the best raw MT output possible. Note, however, that
if an engine is trained on “in-domain” data, that is, using a specialized and ho-
mogeneous dataset, based on texts of a particular genre and related to a partic-
ular field of activity (see Ramírez-Sánchez 2022: §2.1 [this volume]), then the
best possible pre-editing, if needed, will involve introducing edits that match the
characteristics of that genre and domain. In addition to this general advice, in
many cases it is also necessary to take into account guidelines that are specific
to the source or target language. This might mean avoiding formulations that are
particularly ambiguous, not only for the MT system, but also for the reader.
If we take English for instance, avoiding ambiguous expressions means, for
example, avoiding invisible plurals. A noun phrase such as “the file structure”
could refer to both “the structure of files” and “the structure of a particular file”.
Although this ambiguity is resolved as the reader moves through the text, the
wording of the noun phrase itself is not clear enough to provide an unambiguous
translation. Another example of ambiguous structures in many languages, not
only in English, is often the way in which negation is expressed. Sentences such
as “No smoking seats are available.” are notorious for giving rise to different
interpretations and, consequently, incorrect translations.
Verb tense forms are another aspect that may be simplified for the sake of in-
telligibility for the reader and error-free translations. Although the translation
of the different verb tense forms and modes does not necessarily pose a problem
for MT, an inappropriate use of verb tenses in the target language, despite result-
ing in well-formed sentences, can lead to target translation text comprehension
errors. Typical guidance related to verb forms is given in Table 3.
92
4 Selecting and preparing texts for machine translation
Guideline Explanation
Short and Avoid unnecessarily complex sentences that introduce ambiguity.
simple This makes it easier to understand the text, both the source and
sentences translation.
Syntactic structures based, for example, on anaphoric or cataphoric
references may not be correctly handled by the NMT system and
may lead to omissions or mistranslations. Avoid syntactic
ambiguities subject to interpretation.
Complete Avoid eliding or splitting information. The compensation
sentences mechanisms for the not explicitly mentioned information typical of
the source language do not necessarily work in the target language.
For instance, a sentence with a verb in passive form which does not
make the agent explicit can lead to misunderstanding in target texts.
The same can happen when one of the sentence complements is
presented as a list of options (in a bulleted list, for example). In such
cases, the sentence complement is broken down into separate
phrases which the NMT system may process incorrectly. Remember
that MT systems normally use the sentence as a translation unit
(see Kenny 2022: §7 [this volume]), i.e., the text between
punctuation marks such as full stops, or paragraph breaks.
Use parallel Use the same syntactic structure in sentences in a list or that appear
structures in in the same context (e.g., section headings, direct instructions). This
related kind of iconic linkage (see Byrne 2006) usually makes it easier to
sentences understand the text, both the source and translation. In addition, it
allows for the systematic identification of errors during a
post-publishing phase.
Active voice Where appropriate, use mainly the active voice or other structures
that make “participants” in an action explicit (taking into account
the conventions of the text genre and the languages involved).
Homogenous Maintain a homogeneous style. This facilitates understanding the
style text, both the source and translation. This is particularly related to
preparing texts for a global audience.
93
Pilar Sánchez-Gijón & Dorothy Kenny
Guideline Explanation
Use the active voice. Where possible and appropriate, use
the active voice.
Use simple verb tense forms; Depending on your language pair
preferably the present or past simple. and the MT engine, you may wish to
avoid using compound verb forms.
Although the same compound form
may exist in both languages, it may
not be used in the same way and
may lead to different interpretations.
Avoid concatenated verbs. Avoid unnecessary concatenations
of verbs that make it difficult to
understand and translate the text.
94
4 Selecting and preparing texts for machine translation
95
Pilar Sánchez-Gijón & Dorothy Kenny
96
4 Selecting and preparing texts for machine translation
Table 4: QA in pre-editing
gender, racial, cultural, and all kinds of inclusivity in language use. This point is
particularly relevant in gender-inflected languages.
Preparing a source text for a global audience, or pre-editing, is carried out with
tools that assist the writer. Most text editing programs include the most basic
functions necessary to carry out pre-editing as well as QA. Other functions are
available only through dedicated authoring tools. Table 5 summarizes the main
functions of controlled language checkers that assist source text pre-editing.
Most editing programmes include functions that allow this type of action to
be performed to one degree or another. However, when pre-editing is part of a
97
Pilar Sánchez-Gijón & Dorothy Kenny
6
Various controlled language checkers and other writing aids are available. Commercial tools
include acrolinx (https://www.acrolinx.com/) and ProWritingAid (https://prowritingaid.com/).
98
4 Selecting and preparing texts for machine translation
99
Pilar Sánchez-Gijón & Dorothy Kenny
9 Concluding remarks
The main objective of MT as a resource in translation projects is to increase pro-
ductivity and, consequently, reduce the time needed to generate a good quality
translation. In this sense, pre-editing manages to optimize the source text content
so as to minimize errors in the translated text (when MT is used for assimilation)
and the editing needed to guarantee the expected quality (when MT is followed
by post-editing or used as a resource for human translation).
When NMT is capable of producing translations with virtually no fluency or
adequacy errors in informative or instructive texts, then the challenges for MT
go beyond these text types. However, translating texts with different commu-
nicative functions, such as for games or texts of a more appellative nature, is not
only a matter of avoiding errors. It is necessary to produce a translation that is
in line with the intention of the source text and with which the target reader can
identify in the same way as the source text reader. In this case, pre-editing takes
on an added value: the preparation of a text suitable for publishing multilingual
content.
As a strategy, pre-editing may play a certain role in foreign language learn-
ing. But its main environment is in multilingual content publishing. Although it
was originally part of translation workflows for technical documentation and the
like, the expansion of NMT could lead to pre-editing being applied to texts of a
more complex nature, or even to translators eventually putting their skills at the
service of the source text, instead of focusing on the target text, as has happened
throughout centuries of translation history.
References
Aixelá, Franco. 2011. An overview of interference in scientific and technical trans-
lation. The Journal of Specialised Translation 11. 75–88. https://www.jostrans.
org/issue11/art_aixela.pdf.
100
4 Selecting and preparing texts for machine translation
Bentivogli, Luisa, Arianna Bisazza, Mauro Cettolo & Marcello Federico. 2016.
Neural versus Phrase-Based Machine Translation quality: A case study. In
EMNLP 2016. arXiv:1608.04631v1.
Bowker, Lynne & Jairo Buitrago Ciro. 2019. Machine translation and global re-
search. Bingley: Emerald Publishing.
Byrne, Jody. 2006. Technical translation. Usability strategies for translating techni-
cal documentation. Dordrecht: Springer.
Canfora, Carmen & Angelika Ottmann. 2020. Risks in neural machine translation.
Translation Spaces 9(1). 58–77.
Castilho, Sheila, Stephen Doherty, Federico Gaspari & Joss Moorkens. 2018. Ap-
proaches to human and machine translation quality assessment. In Federico
Gaspari Joss Moorkens Sheila Castilho & Stephen Doherty (eds.), Translation
quality assessment: From principles to practice, 9–38. Cham: Springer.
Drugan, Joanna. 2013. Quality in professional translation: Assessment and improve-
ment. London: Bloomsbury.
Fry, Deborah. 2003. The localization industry primer. 2nd edition. Updated by
Arle Lommel. Féchy: LISA. https://www.immagic.com/eLibrary/ARCHIVES/
GENERAL/LISA/L030625P.pdf.
Gerlach, Johanna. 2015. Improving statistical machine translation of informal lan-
guage. A rule-based pre-editing approach for french forums. Doctoral thesis. Uni-
versity of Geneva. https://archive-ouverte.unige.ch/unige:73226.
Ghiara, Silvia. 2018. El lenguaje controlado. La eficacia y el ahorro de las palabras
sencillas. https://qabiria.com/es/recursos/blog/lenguaje-controlado.
Google. 2020. Writing for a global audience. Google developer documentation style
guide. https://developers.google.com/style/translation.
Hiraoka, Yusuke & Masaru Yamada. 2019. Pre-editing plus neural machine trans-
lation for subtitling: effective pre-editing rules for subtitling of TED talks. In
Proceedings of machine translation summit XVII: translator, project and user
tracks, 64–72. Dublin: European Association for Machine Translation. https:
//aclanthology.org/W19-6710.
Kenny, Dorothy. 2022. Human and machine translation. In Dorothy Kenny (ed.),
Machine translation for everyone: Empowering users in the age of artificial intel-
ligence, 23–49. Berlin: Language Science Press. DOI: 10.5281/zenodo.6759976.
Marzouk, Shaimaa & Silvia Hansen-Schirra. 2019. Evaluation of the impact of
controlled language on neural machine translation compared to other MT ar-
chitectures. Machine Translation 33. 179–203. DOI: 10.1007/s10590-019-09233-
w.
101
Pilar Sánchez-Gijón & Dorothy Kenny
Maylath, Bruce. 1997. Writing globally: Teaching the technical writing student to
prepare documents for translation. Journal of Business and Technical Commu-
nication 11(3). 339–352.
Miyata, Rei & Atsushi Fujita. 2017. Dissecting human pre-editing toward better
use of off-the-shelf machine translation systems. In Proceedings of the 20th
annual conference of the european association for machine translation (EAMT),
54–59. https://ufal.mff.cuni.cz/eamt2017/user-project-product-papers/papers/
user/EAMT2017_paper_42.pdf.
Miyata, Rei & Atsushi Fujita. 2021. Understanding pre-editing for black-box neu-
ral machine translation. In Proceedings of the 16th conference of the european
chapter of the association for computational linguistics, 1539–1550. https : / /
aclanthology.org/2021.eacl-main.132.pdf.
Moorkens, Joss. 2022. Ethics and machine translation. In Dorothy Kenny (ed.),
Machine translation for everyone: Empowering users in the age of artificial intel-
ligence, 121–140. Berlin: Language Science Press. DOI: 10.5281/zenodo.6759984.
Navarro, Fernando A. 2008. La anglización del español: Mucho más allá de bypass,
piercing, test, airbag, container y spa. In Luis González & Pollux Hernúñez
(eds.), Traducción: Contacto y contagio. Actas del III congreso internacional «
el español, lengua de traducción ». 12-14 July 2006, 213–132. Puebla: ESLEtRA.
https://cvc.cervantes.es/lengua/esletra/pdf/03/017_navarro.pdf.
O’Brien, Sharon. 2003. Controlling controlled English. An analysis of several con-
trolled language rule sets. In Controlled language translation. Dublin City Uni-
versity. 15-17 May 2003. EAMT/CLAW. https://aclanthology.org/2003.eamt-
1.12.pdf.
O’Brien, Sharon. 2022. How to deal with errors in machine translation: Post-
editing. In Dorothy Kenny (ed.), Machine translation for everyone: Empower-
ing users in the age of artificial intelligence, 105–120. Berlin: Language Science
Press. DOI: 10.5281/zenodo.6759982.
Olohan, Maeve. 2015. Scientific and technical translation. London: Routledge.
Pérez-Ortiz, Juan Antonio, Mikel L. Forcada & Felipe Sánchez-Martínez. 2022.
How neural machine translation works. In Dorothy Kenny (ed.), Machine trans-
lation for everyone: Empowering users in the age of artificial intelligence, 141–164.
Berlin: Language Science Press. DOI: 10.5281/zenodo.6760020.
Ramírez-Sánchez, Gema. 2022. Custom machine translation. In Dorothy Kenny
(ed.), Machine translation for everyone: Empowering users in the age of artificial
intelligence, 165–186. Berlin: Language Science Press. DOI: 10 . 5281 / zenodo .
6760022.
102
4 Selecting and preparing texts for machine translation
Rossi, Caroline & Alice Carré. 2022. How to choose a suitable neural machine
translation solution: Evaluation of MT quality. In Dorothy Kenny (ed.), Ma-
chine translation for everyone: Empowering users in the age of artificial intelli-
gence, 51–79. Berlin: Language Science Press. DOI: 10.5281/zenodo.6759978.
Šarcevic, Susan. 1997. New approach to legal translation. The Hague: Kluwer Law
International.
Seoane Vicente, Ángel Luis. 2015. Lenguaje controlado aplicado a la traducción au-
tomática de prospectos farmacéuticos. handle.net/10045/53587. Doctoral Thesis.
URI: http://hdl.
Seretan, Violeta, Pierrette Bouillon & Johanna Gerlach. 2014. A large-scale eval-
uation of pre-editing strategies for improving user-generated content transla-
tion. In Proceedings of the 9th edition of the language resources and evaluation
conference (LREC), 1793–1799. http://www.lrec-conf.org/proceedings/lrec2014/
pdf/676_Paper.pdf.
103
Chapter 5
How to deal with errors in machine
translation: Post-editing
Sharon O’Brien
Dublin City University
Machine Translation output can be incorrect, containing errors that need to be
fixed, especially if the text is destined for publication and if it is important that it
contains no errors. The task of identifying and fixing these errors is called post-
editing (PE). In this chapter, I provide an overview of the PE process, drawing on
both academic and industry sources. I explain how PE is generally divided into
light and full PE, and describe standard guidelines for each type, homing in on
issues that arise in the application of this classification. The chapter also surveys
the various types of interface used in PE (including word processing and spread-
sheet software, and professional computer-aided translation tools), and modes of
interaction (traditional, adaptive or interactive). Finally, concepts and tools used by
researchers into PE are described, and particular focus is put on the measurement
of temporal, technical and cognitive effort.
1 Definition
Machine Translation (MT) is an imperfect technology. For one sentence it might
produce an accurate and contextually acceptable translation, but the next sen-
tence might have a serious error in meaning, an omission, an addition, or a stylis-
tic problem. If MT is being used just to obtain the gist of the meaning from a text,
there may be no need to fix such errors. However, if MT is being used to create a
text for publication or widespread circulation within or outside an organisation,
it is usually necessary to fix any errors in the text. The identification of such
errors and their revision, or correction, is known as post-editing. The term was
Sharon O’Brien. 2022. How to deal with errors in machine translation: Post-
editing. In Dorothy Kenny (ed.), Machine translation for everyone: Empower-
ing users in the age of artificial intelligence, 105–120. Berlin: Language Science
Press. DOI: 10.5281/zenodo.6759982
Sharon O’Brien
106
5 How to deal with errors in machine translation: Post-editing
107
Sharon O’Brien
be measured? And can light post-editing really be done without any attempt to
produce something comparable to human translation? Furthermore, it is unclear
how much longer full post-editing would take compared with light post-editing.
Such questions prompted the translation industry to describe these levels more
formally. For example, the Translation Automation User Society (TAUS) cre-
ated guidelines suggesting that full post-editing would include stylistic changes,
whereas light post-editing would not.
The TAUS guidelines (TAUS 2010) for light and full post-editing are listed in
Table 1, where comparable guidelines appear side by side, and an empty cell in-
dicates that there is no comparable guideline in one set of guidelines:
According to the ISO18857 standard (p.6), the objectives of post-editing are to
ensure:
where “TSP” stands for “translation service provider” and is defined as a “lan-
guage service provider that delivers translation services” (p.4).
These objectives can be attained by ensuring that the following criteria are
met (p.6):
• Correct formatting;
• Suitability for the target audience and for the purpose of the target lan-
guage content;
108
5 How to deal with errors in machine translation: Post-editing
There are overlaps between the two sets of guidelines, though they prioritise
different aspects of the task. Taken together, they represent typical guidelines for
post-editing. The TAUS guidelines encourage as much re-use of raw MT output
as is practical, whereas the ISO guidelines focus more on agreements, standards
and suitability for the target audience. The notion of reusing as much of the raw
MT output as is possible is an essential aspect of the post-editing task. It is very
easy for a translator to simply ignore the MT output, delete it and translate the
source sentence directly. In fact, many translators are tempted to do this because
109
Sharon O’Brien
they believe that they can produce a better translation and that it will take less
time than post-editing. While the first belief was certainly true some time ago, the
development of neural machine translation has, in general, increased the quality
of MT to such an extent that raw output is now much more useful and usable. The
idea that translation would take less time than post-editing is, on the other hand,
open to debate. Studies have shown that post-editing can certainly be faster than
translation, even if translators think that they are faster (e.g. Guerberof Arenas
2014). Being able to rapidly assess the output from an MT system, decide if it is
usable, and what edits are required is something that can be honed with practice.
Levels of post-editing are conceptually linked with levels of quality, though
as will be shown in the examples below this linkage is problematic. Light post-
editing is seen to be linked with “good enough quality”, or text that is “merely
comprehensible”, i.e. text that should be accurately translated, but that does not
necessarily have to flow very naturally or be stylistically sophisticated. On the
other hand, full post-editing is linked with “quality similar or equal to human
translation”. Here again, we run into some difficulty because the inherent as-
sumption is that “human translation” is always of a high standard, something
that is frankly not always the case.
To better understand the complex, and sometimes confusing, relationship be-
tween levels of post-editing and levels of quality, let us look at an example:
The two errors in (a) are quickly fixed to render sentence (b). On the one hand,
we could say that we have edited (a) lightly; we implemented two rapid edits.
However, if we were to follow the light post-editing guidelines, we probably
would not implement any edits at all. Implementation of “full post-editing” guide-
lines would mean that the two errors must be fixed to produce a translation that
is semantically, syntactically and grammatically correct. So, with these two rapid
edits, have we engaged in light or full post-editing?
Let us take a look at another example.
110
5 How to deal with errors in machine translation: Post-editing
3 Post-editing interfaces
At a basic level, post-editing can be done in any text editor where the source text
is visible and the “raw” MT output can be revised. This could even be a spread-
sheet, with the source text in one column and the MT output in an adjacent one.
These days, however, professional translation is typically done using computer-
aided translation (CAT) environments, especially translation memory (TM) tools.
As indicated by Kenny (2022: §4 [this volume]), translation memory is a database
that stores segments of texts that have been previously translated. A TM tool is
the software application that is used to access, edit and update the text in this
database. MT, as is evident from the other chapters in this book, is a different
type of technology, though the two are inevitably linked because contemporary
data-driven MT systems typically use the data stored in TMs as an important in-
put for machine learning. Additionally, seeing as TM tools are so commonly used
by translators in their daily work, MT technology is now linked to, if not com-
pletely embedded in, TM tools such as Trados Studio, MQM, MateCat, to name
but a few. From a practical perspective, this means that post-editing is frequently
carried out in a TM editing environment.
1
https://www.irishtimes.com/news/education/foreign-languages-could-be-taught-in-
preschool-and-primary-department-1.4270886
111
Sharon O’Brien
112
5 How to deal with errors in machine translation: Post-editing
113
Sharon O’Brien
a result, the prefix “post” seems irrelevant. “Interactive MT” is a more accurate
term. As happens in many domains, the term “post-editing” is now well estab-
lished so it may not disappear soon, but it will probably become defunct as time
goes by. The task itself – interacting with and fixing MT output – is less likely
to become defunct in the near future.
114
5 How to deal with errors in machine translation: Post-editing
115
Sharon O’Brien
Apart from keyboard logging, technical effort is also measured using what is
called edit distance metrics. Put simply, edit distance counts the minimum num-
ber of operations required to transform one string of text into another. An “op-
eration” could be deletion of a word, insertion of a word, or movement of a word
or phrase to another location. There are several metrics for measuring edit dis-
tance, each of which counts the operations slightly differently. One basic metric
is called the Levenshtein distance. It counts the minimum number of character
insertions, deletions or substitutions needed to change to transform one word,
phrase or sentence into another.
For example: Take the word drink and the word drunk. How many characters
have to change to transform one into the other? One: ‘i’ is substituted by ‘u’. Let
us make this a bit more complex: if we transform the phrase “He drinks” into “He
is drinking”, the Levenshtein distance is 6. (Insert ‘i’, ‘s’ and one space character
after ‘He’; substitute ‘i’ for ‘s’ at the end of ‘drinks’ and insert ‘n’ and ‘g’.)5 More
sophisticated edit distance measures can be deployed and one that is often used
to measure PE edit distance is called TER, or the Translation Edit Rate (Snover
et al. 2006). This can be measured on a scale of 0–1 or 0% to 100%. The lower the
score, the lower the PE effort. For example, a score of 30% means, approximately,
that 30% of the raw MT output was edited to create the post-edited version of
a text string. Challenges exist regarding how best to calculate edit distance and
consequently there are several different approaches, with different metrics being
proposed on a regular basis.
Temporal and technical effort are relatively easy to measure. Measuring the
third dimension – cognitive effort – is much more complex. Cognitive effort
refers to hidden cognitive processes such as reading, understanding, comparing
source language meaning to that of the MT output, decision making, while tak-
ing into account the guidelines and expectations, and monitoring the text as it is
revised. These processes take place in the brain and cannot be seen or measured
directly. Nonetheless, cognitive effort is still an important aspect to consider.
Post-editing is sometimes reported as being more demanding a task than transla-
tion without MT as an aid. This is probably due to the list of processes mentioned
above and also to the fact that it is a relatively new task for some. Even if transla-
tors can produce text faster with MT, they may feel more tired than they would
do if they were to produce the translation themselves. Working faster suits com-
mercial production, but not if it results in translator burnout, and that is why
cognitive demand is important to consider when measuring PE effort.
But how can we measure cognitive effort? In fact, this is a question for anyone
who seeks to measure cognitive effort for any task. Sometimes the effort can be
5
These alculations can be done online using, for example, https://planetcalc.com/1721/.
116
5 How to deal with errors in machine translation: Post-editing
estimated by asking the person who performs the task to “think aloud” as they
work. By doing this, they can highlight cognitive difficulties they encounter. Of
course, thinking aloud as you work interferes with and slows down the task itself,
so there are disadvantages to this technique. An alternative approach is to record
the task on the computer screen as it unfolds, then to replay that as a video when
the task is completed, and ask the task performer to retrospectively discuss the
problems they encountered. This has the advantage of not slowing down the task
itself, but it has the disadvantage that the person may not remember all of the
issues they encountered. Finally, researchers have attempted to measure cogni-
tive effort in post-editing using eye tracking, a technology that records where
the eyes fall on the screen, as well as how long the eyes rest on parts of the text
(called fixation duration), and even the pupil dilation, a measurement of pupil size.
These are known to be good measures of cognitive effort. Yet, the challenges are
obvious: you need expensive eye tracking technology, sophisticated knowledge
in how to use it and interpret the data it produces, and you need to control the
data collection environment so that users do not move their heads too much, or
the light does not change substantially because this affects the pupil size, and
so on. Since measuring cognitive effort is a considerable challenge, understand-
ably few include it when they measure PE effort. Nonetheless, it is important
to recognise cognitive effort as an essential component of the effort involved in
post-editing.
There is a final note to add here on measuring PE effort. The amount of effort
should indirectly tell us something about the quality of the output produced by a
specific MT system, for a language pair and topic. Therefore, we can use PE effort
as a form of MT quality evaluation. The lower the quality from the MT system,
the more changes and time will be required. MT quality can be measured in other
ways, by, for example, identifying, classifying and counting the number of errors
produced. This is a useful form of MT quality evaluation but taking the PE effort
into consideration is potentially even more informative because it reveals how
easy or difficult it is to work with the MT output to produce a defined level of
quality.
117
Sharon O’Brien
118
5 How to deal with errors in machine translation: Post-editing
to be used has become central to training, both for translation students and for
those who are not trained in translation (see Bowker & Ciro (2019) for a discus-
sion of “MT literacy”).
References
Bowker, Lynne & Jairo Buitrago Ciro. 2019. Machine translation and global re-
search. Bingley: Emerald Publishing.
de Almeida, Gisele & Sharon O’Brien. 2010. Analysing post-editing performance.
Correlations with years of translation experience. In Eamt 2010. Proceedings of
the 14th annual conference of the European Association for Machine Translation.
EAMT. http://www.mt-archive.info/10/EAMT-2010-Almeida.pdf.
Guerberof Arenas, Ana. 2013. What do professional translators think about post-
editing? Journal of Specialised Translation 19. 75–95. https://www.jostrans.org/
issue19/art_guerberof.php.
Guerberof Arenas, Ana. 2014. Correlations between productivity and quality
when post-editing in a professional context. Machine Translation 28. 165–186.
ISO. 2017. ISO 18857:2017. Translation services – post-editing of machine translation
output: Requirements. https://www.iso.org/standard/62970.html.
Kenny, Dorothy. 2022. Human and machine translation. In Dorothy Kenny (ed.),
Machine translation for everyone: Empowering users in the age of artificial intel-
ligence, 23–49. Berlin: Language Science Press. DOI: 10.5281/zenodo.6759976.
Koponen, Maarit. 2016. Is machine translation post-editing worth the effort? A
survey of research into post-editing and effort. The Journal of Specialised Trans-
lation 25. 131–148. https://www.jostrans.org/issue25/art_koponen.pdf.
Krings, Hans P. 2001. Repairing texts: empirical investigations of machine transla-
tion post-editing processes. Ohio: Kent State University Press.
Nitzke, Jean & Silvia Hansen-Schirra. 2021. A short guide to post-editing. (Transla-
tion and Multilingual Natural Language Processing 16). Berlin: Language Sci-
ence Press. DOI: 10.5281/zenodo.5646896.
O’Brien, Sharon & Silvia Rodríguez Vázquez. 2019. Translation and technology.
In Sara Laviosa & Maria González-Davies (eds.), Routledge handbook of trans-
lation and education, 264–277. London: Routledge.
Snover, Matthew, Bonnie Dorr, Rich Schwartz, Linnea Micciulla & John Makhoul.
2006. A study of translation edit rate with targeted human annotation. In Pro-
ceedings of the 7th conference of the Association for Machine Translation in the
Americas: Technical papers, 223–231. Cambridge, Massachusetts: Association
for Machine Translation in the Americas. https://aclanthology.org/2006.amta-
papers.25/.
119
Sharon O’Brien
120
Chapter 6
Ethics and machine translation
Joss Moorkens
Dublin City University
Neural machine translation (MT) can facilitate communication in a way that sur-
passes previous MT paradigms, but there are also consequences of its use. As with
the development of any technology, MT is not ethically neutral, but rather reflects
the values of those behind its development. In this chapter, we consider the eth-
ical issues around MT, beginning with data gathering and reuse and looking at
how MT fits with the values and codes of the translator. If machines and systems
reflect value systems, can they be explicitly “good” and remove bias from their out-
put? What is the contribution of MT to discussions of sustainability and diversity?
Rather than promoting an approach that involves following a set of instructions to
implement a technology unthinkingly, this chapter highlights the importance of a
conscious decision-making process when designing a data-driven MT workflow.
There is a tension between the idea that an action can be universally good and
moral, such as upholding justice or truthfulness, and the position that values may
differ depending on the person or group under examination. There have been
many suggestions for ways of untangling whether an action is ethical or uneth-
ical based on agency, relationships, or a surrounding narrative. This is where
theoretical ethics moves into applied ethics, in trying to guide how we should
act in a given situation.
Applied ethics in a working situation will often involve a set of codes or stan-
dards to guide professional behaviour. If these codes are too restrictive, they may
hamper potential progress or societal benefits. Rigid codes could also cause diffi-
culties as ethical decisions are rarely binary and choices may be governed by the
unique scenario and pressures brought to bear on the person making that choice.
For this reason, different fields of applied ethics have sprung up to consider com-
mon problems and dilemmas within their particular context. This chapter will
draw on the fields most relevant to machine translation (MT) including computer
and information ethics and data ethics when discussing the ethical use of MT
by humans in system development. §3 on the ethical use of MT in professional
workflows will draw on business ethics and the growing literature on transla-
tion ethics. §4 on computers as ethical agents will draw on machine ethics and
computer and information ethics. The final sections will draw on more recent
diverse work on ethics and artificial intelligence when looking at sustainability
and diversity.
Ethics is a growing area of interest in technology in general, as technology
becomes an increasingly integral part of all of our lives and many regions move
towards ubiquitous computing. We need to be aware of the impact of the choices
we make when we design, implement, or use technology. There is an assump-
tion often expressed that technology is ethically neutral and that bias may be
introduced only in our use of that technology. However, the consensus among
ethicists and philosophers of science is that technology is not ethically neutral,
but rather reflects the values of the designer. These values govern the problem
addressed by the technology, the decision to create the technology, the method
of implementation, its intended users, the references or training data used, the
processing of that data, the location and security of data storage, and the limits
to access to the technology based perhaps on cost or geographical location.
The speed and scale of technological development means that regulation is
inevitably a step or two behind and we are thus reliant on ethical behaviour on
the part of engineers and developers. We rely, to a greater or lesser extent, on
large technology companies with political power and wealth to act in our col-
lective best interests, but a series of reports and revelations in recent years have
122
6 Ethics and machine translation
123
Joss Moorkens
data for MT training and offer it for sale. The data should have all personal infor-
mation removed before being shared (see §2.5), but this information is retained
by accident when the data are uploaded to one purchaser. The company tries to
keep this quiet so as to avoid liability.
What ethical issues can you identify in these two scenarios? What would
change if the employers or translators made different ethical decisions? In the
following subsections, we look at data ownership, permissions, distribution, pri-
vacy, and legal frameworks for data sharing. These subsections will, it is hoped,
help guide your thinking about the above questions.
124
6 Ethics and machine translation
At scale, big translation data has become a valuable resource for MT and ma-
chine learning system training (Moorkens & Lewis 2019a). This does not mean
that translators receive any secondary payment however, and the granular reuse
in MT training means that the source of training data is usually not identifiable.
This is also true of data gathered by webcrawling for parallel texts. Translators A
and B in our case study probably have little option other than to hand over their
translation data and to accept the consequences, especially considering that most
translators work on a freelance basis, and thus have limited scope for argument
with their employers. It is reasonable to argue that a more equitable system of
data ownership would contribute to the sustainability of the translation industry
(see also §5).
125
Joss Moorkens
when and where to reuse the data based on the translator, the project, or the
creation date. Translator activity data, including more detailed timings, editing
actions, and records of individual keystrokes may also be recorded, particularly
when a proprietary web-based platform is used, as in Translator A’s project. Such
data can be useful for monitoring translators’ work, but is commonly removed
for MT training so that only parallel sentence pairs are used. Once any possible
identifying metadata (data about data) are removed, preferences for future use
or reuse cannot be recorded and individual contributions cannot be measured,
even if there is a retrospective change to agreements that means that contribu-
tions should earn a royalty. On the other hand, this will improve anonymization
of translation data, which is important if the data are to be shared or exchanged.
2
Please see the introduction to this volume for more on machine learning. For the purposes
of this chapter, we understand machine learning as a use of computers to achieve an end by
inference from big data rather than from input of an explicit command.
126
6 Ethics and machine translation
127
Joss Moorkens
Translators A and B, for example, could have their translation data aggregated
with other personal data, allowing a third party to make inferences about them
individually or as members of a group. The use of web-based platforms for trans-
lation is increasingly common, giving translators less control of their translation
data and allowing surveillance of work activities. If personal circumstances lead
to a temporary downturn in productivity or translation quality as gauged via
translator activity data gleaned from the work platform, that could negatively af-
fect their prospects for future employment. If identifiable translation activity data
for an individual that encodes this downturn is shared outside of a single organi-
zation, that could have far-reaching consequences for that individual. This does
not necessarily mean that it is unethical to monitor quality or productivity. An
agency or company needs to be able to stand over their translations. However, by
automating employment decisions or communication, as is the case with project
management in the example of Translator B, a company will leave the translator
with no opportunity to explain translation choices or to build a long-term rela-
tionship based on trust. There is no guarantee of ethical behaviour on the part of
the translator or user of a platform at the best of times, but when relationships
are purely transactional, research has shown that trust and the assumption of
good faith on both sides are particularly undermined, with knock-on effects on
satisfaction and performance (Whipple & Nyaga 2010).
128
6 Ethics and machine translation
and there are several problems with crowd evaluation, in which anonymous and
presumably untrained internet users rank or rate segments of sequential trans-
lated material. Freitag et al. (2021) found that expert (professional translator) eval-
uators produced markedly different results to crowd workers when carrying out
a detailed error analysis with access to full source and target documents, and
demonstrated a clear preference for human rather than MT output. Additionally,
there are issues with crowd work related to poor rates of pay, labour conditions,
opaque user rating systems, and use of humans (crowd workers) as research
participants without oversight or ethical review. Nonetheless, published results
based on automatic evaluation and crowd work are almost always reported cur-
sorily, devaluing human translation and creating an unrealistic and uncritical
perception of MT among the general public, including translation clients. This
perception increases the likelihood of MT being introduced into professional
workflows.
129
Joss Moorkens
130
6 Ethics and machine translation
types of texts, particularly those with a short shelf-life that present little risk.
However, for critical texts in which a mistranslation introduces risk, the use of
MT must be considered carefully and subject to review. There is some evidence
that certain project managers do not want to know or prefer to turn a blind eye
when their translators use MT (Sakamoto 2019), but there may be good reasons
for translation clients to be aware of MT use and to stipulate in contracts whether
or not it may be used.
131
Joss Moorkens
value if the translator feels that they are in a trust partnership, without which
they may rationalize unethical behaviour (see, for example, Abdallah 2010).
Aside from concerns about risk, translators and users may not wish to use MT
due to the processes described in §2 or due to the impact of artificial intelligence
(AI) on the world of work and sustainability. The following section examines the
latter point with respect to NMT.
4 Sustainability
4.1 Payment, conditions, job satisfaction among
translators/post-editors
Translation is a highly skilled task, but portions of the workflow have been auto-
mated (to an extent) in the examples of Translators A and B, with automatic job
assignment, the imposition of post-editing, and the repurposing of translation
data for tasks that the translators may not expect. There is growing consensus
that AI will have a major impact on work in many areas previously considered to
be immune from automation. While this might not directly cause higher unem-
ployment rates, the changes could affect economic returns, work organization,
and skills management in ways that are difficult to predict. These are consider-
ations for the future in many industries, but in translation the impacts are well
underway for a couple of reasons. Firstly, MT post-editing has been the fastest-
growing area of the translation market since 2010 or so, predating the shift from
statistical to neural MT. Stockpiling of translation data has been commonplace
since the advent of translation memory tools in the early 1990s, although the col-
lection of translation activity data for monitoring and automation is relatively
recent. Secondly, the largely freelance workforce means that translators have
flexibility and autonomy, but work on a project-by-project timeframe. This has
created a disparity of power, whereby translators have little say in processes
and conditions that can be changed unilaterally by agencies and employers from
one project to the next. The effect of the disparity of power is apparent from
the discussions regarding data in §2. As the pace of mergers and acquisitions
has increased, creating large publicly-traded translation conglomerates, the dis-
connect has grown between those making decisions on business operations and
freelance workers doing translation, post-editing, revision, annotation, review,
subtitling, or another of the vast and growing array of roles that engage directly
with texts. Suggestions from the industry to automate project management and
to use blockchain to attribute authorship or contribution are not likely to im-
prove this situation. More generally, the translation industry has not historically
132
6 Ethics and machine translation
133
Joss Moorkens
the CO2 output of a car (including fuel) during its full lifetime.6 However, most
training instances are far less resource-intensive than the one reported in this pa-
per. Furthermore, while hardware becomes more powerful and costly to engineer
and produce, optimization of power consumption and the potential to run mas-
sive amounts of parallel processes mean that the power required for training is
dropping. Nonetheless, it remains the case that training an NMT system is costly
and requires a good deal of power. How that impinges on the environment will
depend on the source of that power. There is currently no agreed benchmark for
power consumption when publishing details of MT systems, although some have
been proposed in the context of suggestions for sustainable AI development. The
point made strongly by Van Wynsberghe (2021) is that without a focus on sus-
tainability in the development and deployment of AI (and, by extension, NMT),
AI development itself will not be sustainable.
5 Diversity
5.1 Among developers and users
The cost and power requirements are a huge barrier to entry into NMT devel-
opment. The data requirements meant that early systems had to use publicly
available data (see §2), usually creating systems for major European languages.
It comes as no surprise then, that initial published work on NMT was conducted
mostly by well-resourced academic research groups in North America and Eu-
rope. This has changed somewhat for two main reasons. Firstly, large technology
companies have thrown their weight behind research efforts in NMT, building
very well-resourced teams that lead the way in optimizing MT systems between
major languages. This means that many academic research groups struggle to
compete in major European languages and have moved to the more “niche” area
of low-resource and minority languages. Secondly, the ability to create synthetic
parallel data by machine-translating monolingual data from the intended target
language into the intended source language7 has led to a jump in quality for
under-resourced language pairs. Thus the Fifth Conference on Machine Transla-
tion (WMT20) includes translation in Inuktitut and Tamil to and from English.
However, another way to improve quality for low-resource languages is to build
large multilingual systems, which are usually the preserve of the large commer-
cial teams.
6
We note also that only the largest companies can afford the costs of training such large-scale
models.
7
MT researchers call this process “back-translation”. It is not to be confused with “back-
translation” used as a glossing technique in standard translation studies sources such as Baker
(2018).
134
6 Ethics and machine translation
135
Joss Moorkens
136
6 Ethics and machine translation
7 Summary
As MT quality improves, the technology facilitates more communication either
directly or as part of a translation workflow. However, there are ethical con-
cerns to be considered by MT developers, translation buyers, translation agen-
cies, translators, and consumers of translation. As with all technologies, neither
MT development nor MT output should be considered neutral, but rather as pro-
mulgating the perspective of the developers or the translators who created the
training data, in the tools for interaction with MT and in the output text. Uncrit-
ical reporting of positive MT evaluation results minimizes public awareness of
risk and bias in MT output while potentially devaluing the work of human trans-
lators. Readers may find it useful to consider the issues raised in this chapter
when working with and using MT, and reflecting on how the related processes
fit with their own values, purposes, and principles.
137
Joss Moorkens
Ethical considerations as laid out in this chapter begin with the source of trans-
lation and translator data, ownership, permissions, copyright, and mode of dis-
tribution. The ethical use of MT within professional translation workflows may
depend on the attitudes of all stakeholders, rules of confidentiality, and the design
decisions behind MT platforms. This relates to sustainability, modes of interac-
tion with MT, and the degree of autonomy and ownership of the process allowed
for translators. The methods by which we can ensure diversity and de-bias MT
systems and data are perhaps least developed, and will no doubt require further
discussion and adjustment over time.
References
Abdallah, Kristiina. 2010. Translator’s agency in production networks. In Tuija
Kinnunen & Kaisa Koskinen (eds.), Translator’s agency, 11–46. Tampere: Tam-
pere University Press.
Baker, Mona. 2018. In other words. London: Routledge.
Canfora, Carmen & Angelika Ottmann. 2020. Risks in neural machine translation.
Translation Spaces 9(1). 58–77.
Chesterman, Andrew. 2001. Proposal for a hieronymic oath. The Translator 7(2).
139–154. DOI: 10.6509.2001.10799097.
Cronin, Michael. 2017. Eco-translation: Translation and ecology in the age of the
anthropocene. London: Routledge.
Docherty, Peter, Mari Kira & A. B. (Rami) Shari. 2008. What the world needs now
is sustainable work systems. In Peter Docherty, Mari Kira & A. B. (Rami) Shari
(eds.), Creating sustainable work systems: Developing social sustainability, 1–22.
London: Routledge.
European Parliament. 1996. Directive 96/9/EC of the European Parliament and of
the Council of 11 March 1996 on the legal protection of databases. https://eur-
lex.europa.eu/legal-content/EN/ALL/?uri=CELEX:31996L0009.
Floridi, Luciano & Mariarosaria Taddeo. 2016. What is data ethics? DOI: 10.1098/
rsta.2016.0360.
Freitag, Markus, George Foster, David Grangier, Viresh Ratnakar, Qijun Tan &
Wolfgang Macherey. 2021. Experts, errors, and context: a large-scale study of
human evaluation for machine translation. Transactions of the Association for
Computational Linguistics 9. 1460–1474. DOI: 10.1162/tacl_a_00437.
ISO. 2015. ISO 17100:2015. Translation services – requirements for translation ser-
vices. https://www.iso.org/standard/59149.html.
ISO. 2017. ISO 18857:2017. Translation services – post-editing of machine translation
output: Requirements. https://www.iso.org/standard/62970.html.
138
6 Ethics and machine translation
139
Joss Moorkens
Tiedemann, Jörg. 2012. Parallel data, tools and interfaces in OPUS. In Proceed-
ings of the 8th International Conference on Language Resources & Evalua-
tion (LREC 2012). 2214-2218. Luxembourg: ELRA. http://www.lrec-conf.org/
proceedings/lrec2012/pdf/463_Paper.pdf.
Tomalin, Marcus, Bill Byrne, Shauna Concannon, Danielle Saunders & Stefanie
Ullmann. 2021. The practical ethics of bias reduction in machine translation. Why
domain adaptation is better than data debiasing. 419–433. DOI: 10.1007/s10676-
021-09583-1.
Topping, Suzanne. 2000. Sharing translation database information: Considera-
tions for developing an ethical and viable exchange of data. Multilingual 11(5).
59–61.
Toral, Antonio. 2019. Post-editese: An exacerbated translationese. In Proceedings
of machine translation summit XVII, 273–281. EAMT. https://www.aclweb.org/
anthology/W19-6627/.
Troussel, Jean-Christophe & Julien Debussche. 2014. Translation and intellectual
property rights. Luxembourg: Publications Office of the European Union.
Van Wynsberghe, Aimee. 2021. Sustainable AI: AI for sustainability and the sus-
tainability of AI. AI and Ethics 1. 213–218. DOI: 10.1007/s43681-021-00043-6.
Vanmassenhove, Eva. 2019. On the integration of linguistic features into statistical
and neural Machine translation. PhD Thesis. Dublin City University.
Vanmassenhove, Eva, Dimitar Shterionov & Andy Way. 2019. Lost in translation:
Loss and decay of linguistic richness in machine translation. In Proceedings
of machine translation summit XVII: Research track. Dublin: EAMT, 222–232.
https://www.aclweb.org/anthology/W19-6622.pdf.
Vieira, Lucas Nunes, Minako O’Hagan & Carol O’Sullivan. 2020. Understanding
the societal impacts of machine translation: A critical review of the literature
on medical and legal use cases. Information, Communication & Society 24(11).
1515–1532. DOI: 10.118X.2020.1776370.
Wachter, Sandra & Brent Mittelstadt. 2019. A right to reasonable inferences: Re-
thinking data protection law in the age of big data and AI. Columbia Business
Law Review 2019(2). 1–130.
Weizenbaum, Joseph. 1986. Not without us. Computers and Society 16. 2–7.
Whipple, Daniel F. Lynch, Judith M. & Gilbert N. Nyaga. 2010. A buyer’s perspec-
tive on collaborative versus transactional relationships. Industrial Marketing
Management 39(3). 507–518. DOI: 10.1016/j.indmarman.2008.11.008.
World Intellectual Property Organization. 1979. Berne convention for the protec-
tion of literary and artistic works. (as amended on September 18, 1979). Geneva:
WIPO. http://www.wipo.int/wipolex/en/details.jsp?id=12214.
140
Chapter 7
How neural machine translation works
Juan Antonio Pérez-Ortiz
Universitat d’Alacant, Spain
Mikel L. Forcada
Universitat d’Alacant, Spain
Felipe Sánchez-Martínez
Universitat d’Alacant, Spain
This chapter presents the main principles behind neural machine translation sys-
tems. We introduce, one by one, key concepts used to describe these systems, so
that the reader achieves a comprehensive view of their inner workings and pos-
sibilities. These concepts include: neural networks, learning algorithms, word em-
beddings, attention, and the encoder–decoder architecture.
1 Introduction
The first thing you should know about neural machine translation (NMT) is that
it considers translation as a task involving operations on numbers performed
by mathematical systems called artificial neural networks: these systems take a
sentence and transform it into a series of numbers. They add some more num-
bers here (usually, thousands or millions of them), multiply by other numbers
there, perform a few additional, relatively simple, mathematical operations, and
eventually output a translation of the original sentence into another language.
Maybe you have always considered translation from a different perspective:
as an intellectual task that involves cognitive processes which can barely be ex-
plicitly enumerated and which take place in some deep areas of the human brain.
And you are indeed right! But the approximation currently carried out by com-
puters follows a completely different path: millions of mathematical operations
are performed in a fraction of a second to obtain a translation which may some-
times be labelled as adequate and may sometimes not. And it turns out that the
percentage of times they happen to be adequate has dramatically in the last few
years. But, historically, artificial neural networks were devised as a simplified
model of how natural neural networks such as our brains work, and the cognitive
processes carried out in it are also the result of distributed neural computation
processes which are not that different from the mathematical operations men-
tioned above.
This chapter will teach you the key elements of NMT technology. We will
start off by pointing out the connection between how translation could be car-
ried out in a human brain and how an NMT system undertakes it. This will help
us to introduce the basic concepts needed to get a comprehensive overview of
the principles of machine learning and artificial neural networks, which constitute
two of the cornerstones of NMT. After that, we will discuss the essential princi-
ples of non-contextual word embeddings, a computerised representation of words
with many interesting properties that, when combined through a mechanism
known as attention, will produce the so-called contextual word embeddings, a key
factor in the realisation of NMT. All these ingredients will allow us to present an
overall picture of the inner workings of the two most used NMT models, namely,
the transformer and the recurrent models. The chapter wraps up by introducing
a series of secondary themes that will improve your knowledge on how these
systems run behind the scenes.
142
7 How neural machine translation works
143
Juan Antonio Pérez-Ortiz, Mikel L. Forcada & Felipe Sánchez-Martínez
on the signals they receive from other neurons and the strength of the connec-
tions carrying these signals.
In the first step, the activations of neurons 𝑆1 , 𝑆2 and 𝑆3 , all of them connected
to neuron 𝑆4 , are added, but first each one is multiplied by a weight (𝑤1 , 𝑤2 and
𝑤3 ) representing the strength of their connections; these weights determine how
their activations are turned into actual stimuli for neuron 𝑆4 . Weights may be
positive or negative. For instance, if weight 𝑤2 is positive and the activation of 𝑆2
is high, it will contribute to exciting neuron 𝑆4 (a positive stimulus); if, however,
𝑤2 is negative, it will contribute to inhibiting neuron 𝑆4 (a negative stimulus). In
general terms, neurons connected through positive weights tend to be simulta-
neously excited or inhibited, while neurons connected through negative weights
tend to be in opposite states. Coming back to neuron 𝑆4 , if we add the stimuli
coming from each neuron, we get a net stimulus:
𝑥 = 𝑤 1 × 𝑆 1 + 𝑤 2 × 𝑆 2 + 𝑤3 × 𝑆3 . (1)
The net stimulus 𝑥 can take any possible value, negative or positive, but it
is not the activation of neuron 𝑆4 yet. In the second step, neuron 𝑆4 reacts to
this stimulus. In the example, when the stimulus is intermediate, that is, not too
positive or too negative, the neuron 𝑆4 is very sensitive to it. However, when
144
7 How neural machine translation works
stimuli get large (no matter if positive or negative), changes in their values have
a lesser impact on the output, as the neuron is respectively largely inhibited or
largely excited.
In the example, neuron 𝑆4 is such that its activation is bound between −1 and
+1. Figure 2 represents how neuron 𝑆4 reacts to the stimulus in equation 1. The
reaction is represented with a function 𝐹 (…), called the activation function, which
is applied to the stimulus; the result is the activation of 𝑆4 :
As can be seen, for values around 0 in the horizontal axis the reaction is pro-
portional to the stimulus, but for large positive or negative stimuli, when the
neuron is very inhibited or very excited, the reaction is much smaller. For this
kind of neuron, the actual extreme values of −1 and +1 are never reached, no
matter how strong the total stimulus is. As said above, neuron 𝑆4 in our exam-
ple is a specific type of neuron with an activation that varies between −1 and +1.
There are other kinds of activation functions with different ranges, but exploring
them is out of the scope of this chapter.
145
Juan Antonio Pérez-Ortiz, Mikel L. Forcada & Felipe Sánchez-Martínez
146
7 How neural machine translation works
147
Juan Antonio Pérez-Ortiz, Mikel L. Forcada & Felipe Sánchez-Martínez
148
7 How neural machine translation works
is observed in a different development set, which has been reserved or ”held out”
for this purpose (see Section 7.2). The technical details of the training algorithm
are beyond the scope of this chapter; let us just say that it is usually based on
computing how much the error function varies when each weight is varied by a
fixed but very small amount (the gradient of the error function), and then vary-
ing each weight a bit in the direction in which it reduces the error function.2 This
type of training is called gradient descent; it is not guaranteed to find the very
best weights, but it is likely that good candidates will be found. The intensity of
these weight variations is regulated by a parameter called the learning rate; this
learning rate is usually higher in the first steps of the training algorithm, but its
magnitude is made progressively smaller as the weights get closer to their final
values. Note that training neural networks is quite laborious: many examples
are necessary and they need to be presented many times to learn. This is often
due to limitations of the training algorithms, however, rather than to the lack of
capacity of a specific neural network to represent the solution to a problem.
Once the weights are determined, training stops (see Section 7.2) and the neu-
ral network can be used to obtain the outputs for new inputs which are not in-
cluded among the examples used during training.
2
Some of you may recognise here the mathematical concept of derivative of a function.
149
Juan Antonio Pérez-Ortiz, Mikel L. Forcada & Felipe Sánchez-Martínez
150
7 How neural machine translation works
idea of gardens and orchards. There are some outliers on the list, especially the
word consistently, which seems in principle disconnected from the rest of words,
forcing us to put it as far as possible from all of them. Chromosome is another
isolated word, but as flowers and waiters use chromosomes to carry their genetic
information, it may be put somewhere in the middle of the line between these
words but at the same time not very close to red. See Figure 4 for a possible
solution that may not match yours exactly.3
In order to assign mathematical codes to the words in our list, let’s assign
coordinates to each word to reflect its position on the square. As we are in a
two-dimensional space, we need two coordinates for each word: the first coor-
dinate is a number that represents the distance to the left vertical side of the
square; the second coordinate is a number that represents the distance to the
bottom horizontal side of the square. The word restaurant could be assigned, for
example, the two numbers 0.25 and 1.1, and the word menu the numbers 0.6 and
1.3, close to restaurant as seen in Figure 4. These coordinate values can be rep-
resented using vector notation, which simply consists of writing the numbers as
a comma-separated list of values between brackets. The vectors corresponding
to restaurant and menu would therefore be [0.25, 1.1] and [0.6, 1.3], respectively.
Each of these vectors represents a possible word embedding for these two words.
Although it may not be completely obvious, considering embeddings made up
of two numbers instead of a single number boosts the possibilities of solving the
problem of placing words closer or farther apart as we have more freedom to sat-
isfy all the restrictions. In fact, moving from two dimensions to a higher number
of dimensions increases these possibilities even more. A five-dimensional repre-
sentation of a word could be, for example, [2.34, 1.67, 4.81, 3.01, 5.61]. NMT sys-
tems consider embeddings with hundreds of dimensions, and the input sentence
to be translated is represented by a collection of these vast word embeddings.
Word embeddings are learned using the very same algorithm used to learn the
weights of the neural network presented in Section 3.5. In fact, both the weights
and the embeddings are learned at the same time. Bearing in mind that the input
layer of a neural network involved in NMT usually consists of the embeddings
of the words in the input sentence, there is no need to limit ourselves to fixed
vectors. Instead, their values can be repeatedly updated during training in such
a way that the value of the error function is minimised.
3
We have deliberately placed Figure 4 a few pages on, so that you do not see it before you
attempt the exercise.
151
Juan Antonio Pérez-Ortiz, Mikel L. Forcada & Felipe Sánchez-Martínez
4.1 Generalisation
As already discussed, for the network to be able to properly generalise, that is, to
be able to learn to translate and be capable of translating sentences never seen be-
fore, similar sentences should get similar representations. As sentence represen-
tations are obtained from word embeddings, we may conclude that representing
similar words with similar numbers is a precondition for generalisation in neu-
ral natural language processing. Following our example, words such as poured,
rained, pouring or raining should ideally share similar embeddings as all of them
are semantically similar; the codes for pouring and raining should also be closer
to words such as driving since the three of them are gerunds and may appear in
similar contexts; poured and rained should be neighbours as well because both
of them are past tenses. This is why we usually need many dimensions: we want
words to be close to each other in different ways or for different reasons, simul-
taneously.
where the square brackets refer to the embedding of a word, and with ≃ we mean
that the resulting embedding after the operation is close to the embedding of the
152
7 How neural machine translation works
word on the right-hand side of the example. This can be interpreted as indicating
that king is to man what queen is to woman, a male or female monarch; and Dublin
is to Ireland what Paris is to France, the capital of a country.
153
Juan Antonio Pérez-Ortiz, Mikel L. Forcada & Felipe Sánchez-Martínez
1. The first episode will pick up right where the previous season left off.
154
7 How neural machine translation works
some attention too (10%), which may be explained by the fact that it helps to
label season as a noun. The contribution of the verb (8%) to the contextual em-
bedding may also be described in terms of its contribution to marking the number
of season as singular. Note that the percentages always add up to 100%.
Determining how the attention vector is used in order to obtain a new em-
bedding that combines the original non-contextual embeddings to get a new em-
bedding is beyond the scope of this chapter. Suffice to say that the procedure
involves a specific sequence of mathematical operations and that the resulting
embedding will be located somewhere in between the original embeddings.
Following our running example, nine different attention vectors will be com-
puted for this sentence (one for each word) and then applied to the original non-
contextual embeddings in order to obtain a collection of nine new embeddings,
each one corresponding to a different word in the sentence. These new embed-
dings may be considered as contextual embeddings as they are influenced to
different degrees by the rest of the words in the sentence.
155
Juan Antonio Pérez-Ortiz, Mikel L. Forcada & Felipe Sánchez-Martínez
cooked in it. A single attention vector would have to mix both flavours in a sin-
gle embedding containing too much heterogeneous information that could affect
negatively the search for a translation for the word represented by the embed-
ding. For this reason, some NMT systems obtain different attentions for each
word in each layer and use them to compute a number of different embeddings
for each word. Each of these embeddings is said to be computed by a different
head. T-NLG has 28 attention heads in each layer. Therefore, its last layer pro-
duces 28 different 4,256-dimensional embeddings for each word.
156
7 How neural machine translation works
15% 85%
the embeddings of all the words in the source sentence as well as to the embed-
dings of the target words already generated. The whole architecture is called a
transformer (Vaswani et al. 2017). Figure 5 shows an example of a three-layered
encoder and the degrees of attention considered in order to compute an embed-
ding in the second layer and in the third one. Figure 6 depicts this encoder in an
extended diagram that also includes the decoder so that it represents the whole
transformer architecture.
A parallel corpus is used by the learning algorithm to obtain a set of weights,
embeddings and attention vectors for the transformer such that the training data
can be reproduced up to a certain degree and the system is able to generalise
beyond the sentences in the training set.
For example, assume that a transformer with one single head per layer is used
to translate the sentence “My grandpa baked bread in his oven daily” into Span-
ish. The encoder first produces a collection of eight embedding vectors. The de-
coder then computes an 8-dimensional attention vector such as [60%, 10%, 0%, 0%,
157
Juan Antonio Pérez-Ortiz, Mikel L. Forcada & Felipe Sánchez-Martínez
0%, 30%, 0%, 0%] and uses it to obtain a flavour of the source sentence that allows
it to obtain an embedding for the first word in the target sentence. Let us assume
that the system correctly generates the Spanish word mi. The decoder will then
compute a 9-dimensional attention vector such as [50%, 10%, 0%, 0%, 0%, 20%, 0%,
0%, 20%] (the last percentage corresponds to the attention paid to the first word
in the target sentence) and use it to obtain an embedding for the second word in
the target sentence. The procedure will continue until the decoder generates a
special token that marks the end of the sentence.
The output of the decoder at each step is not exactly an estimation of the
embedding of the next word. Actually, an additional layer is added at the end of
the decoder to compute a vector of probabilities or likelihoods for each word in
the target-language vocabulary. Section 7.3 will discuss how these probabilities
can be used in order to obtain the sequence of words that result in the target-
language sentence.
158
7 How neural machine translation works
of embeddings for the words in the input sentence and a decoder that uses atten-
tion to compute embeddings for each target word by integrating the information
from the input words and the already generated target words. The encoder and
decoder in the recurrent model, however, compute the contextual word embed-
dings in a local manner in such a way that the embeddings for the fifth encoded
word, for example, are based on the embeddings of the four first words, on the
one hand, and the embeddings of the next words, on the other hand. This is
achieved by traversing the input sentence from left to right and from right to
left; see Figure 7 for a diagram of this model showing only left-to-right process-
ing.
It is worth noting that the mathematical model used imposes some restric-
tions on the relevance given to the words around the word for which the con-
textual word embeddings are computed (in our example the fifth one), resulting
in a mechanism that specially focuses on the nearest words and tends to ignore
the representations of distant words. Similarly to the transformer, a final layer
at the end of the decoder computes a vector that gives the probability of each
target-language word being the word at the corresponding position in the output
sentence. Forcada (2017) describes in more detail the recurrent encoder–decoder
model and also discusses the kind of outputs that NMT produces.
159
Juan Antonio Pérez-Ortiz, Mikel L. Forcada & Felipe Sánchez-Martínez
7 Additional settings
7.1 Words and sub-words
According to what has been presented in this chapter, independently of whether
a transformer or a recurrent model is used, an embedding is obtained for each
word after training. Does this mean that we end up having an embedding for ev-
ery possible word in the language? Not really. Languages, specially those which
are highly inflected or agglutinative, may easily have hundreds of thousands or
even millions of different word forms. In order to understand why this poses a
challenge for NMT systems you should know that the number of word embed-
dings (which is referred as the vocabulary) conditions the number of weights in
the neural network and that large neural networks often struggle to generalise
to unseen data. The size of the vocabulary could be reduced by considering only
those word forms present in the training corpus but this usually still implies con-
sidering a substantial number of words and raises a new issue: when training is
finished and the NMT system undertakes the translation of new sentences con-
taining words not in the training set, these unseen words will make the model
perform clumsily and lose accuracy as every unknown word is assigned a single
non-contextual embedding reserved for this situation.
The solution engineers came up with is to split words into so-called sub-word
units. Ideally, these units should make linguistic sense and carry some compo-
nents of meaning; for instance, splitting demystifying as de- + -myst- + -ify- +
-ing surely makes more linguistic sense (and is therefore likely to be more help-
ful when it comes to performing machine translation) than splitting it as dem-
+ -ystif- + -yi- + -ng. But performing a linguistically sound splitting requires the
existence of a set of splitting rules and procedures for the language in question,
a resource that may not be available for many languages.
A commonly-used workaround is to automatically learn splitting rules by in-
specting large texts, such as one containing all the source or all the target sen-
tences in the training set. A popular approach5 is called byte-pair encoding (BPE)
(Sennrich et al. 2016), and starts with letter-sized units which are joined into
two-letter, three-letter, etc. units when they appear frequently in the corpus.6
Byte-pair encoding would probably identify a frequent -ing suffix in many verb
5
There are more advanced methods such as SentencePiece (Kudo & Richardson 2018), which
treats the whole text as a sequence of characters and performs word division (tokenization)
and sub-word division in one fell swoop.
6
Byte-pair encoding was originally a text compression algorithm: frequent letter (byte) se-
quences would be stored once and replaced by short codes to reduce the total storage needed.
160
7 How neural machine translation works
forms (marching, considering) and chop it off, even for unseen forms (such as bart-
simpsoning); -ing would then be turned into a contextual embedding carrying its
atomic meaning.
161
Juan Antonio Pérez-Ortiz, Mikel L. Forcada & Felipe Sánchez-Martínez
instance, the three most likely words, and clone the system into three systems,
each of which would be determined respectively by each of the three choices, and
see how they fare. But one cannot do this indefinitely, as one would triplicate the
number of systems translating the sentence at each step, and their number would
grow exponentially. To avoid that, only a certain number of systems are allowed
to survive, namely those obtaining the best value in an approximate calculation
of the probability of the full sentence that would be produced. This is usually
called beam search and is a common approximation in other probabilistic models
of human language processing such as speech recognition.
8 Conclusions
To train an NMT system, one needs thousands or even millions of examples of
source sentence–target sentence pairs. For many language pairs, many domains
and many text genres, such resources do not exist, which constrains many spe-
cific applications, but for well-resourced languages, general-purpose NMT is a
reality and is very widely used, not only by translators. Moreover, scientific ad-
vances in approaches such as multilingual models or unsupervised NMT have
recently started to produce promising results in low-resource scenarios.7
This chapter has introduced – and provided technical details of – the key ele-
ments in NMT systems, and explored how they interact in the two currently most
popular architectures, namely transformer-based and recurrent neural networks.
Research activity in the area is so intense at the time of writing that proposals
for new models arise almost every month. Transformers are currently the para-
digm of choice if enough parallel corpora are available for training, because they
require shorter training times and allow subtle quality improvements in compar-
ison to recurrent neural networks, but the picture may change dramatically at
any time.
7
A multilingual model is a single neural network that is trained to translate between many dif-
ferent language pairs so that knowledge from well-resourced languages may be transferred to
low-resourced ones. Interestingly, multilingual models bring the possibility of zero-shot trans-
lation (Ko et al. 2021) in which a system may be able to translate with reasonable quality, for
example, between Spanish and Upper Sorbian using a multilingual model trained on German–
Upper Sorbian and Spanish–German corpora, even when no Spanish–Upper Sorbian parallel
corpus is available. Unsupervised NMT goes a step further by learning NMT systems from
monolingual corpora only.
162
7 How neural machine translation works
References
Bahdanau, Dzmitry, Kyunghyun Cho & Yoshua Bengio. 2015. Neural machine
translation by jointly learning to align and translate. In Yoshua Bengio & Yann
LeCun (eds.), 3rd International Conference on Learning Representations, ICLR
2015. DOI: 10.48550/arXiv.1409.0473.
Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Ka-
plan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry,
Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger,
Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu,
Clemens Winter, Dario Amodei, Christopher Hesse, Mark Chen, Eric Sigler,
Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner,
Sam McCandlish, Alec Radford, Ilya Sutskever & Dario Amodei. 2020. Lan-
guage models are few-shot learners. CoRR abs/2005.14165. https://arxiv.org/
abs/2005.14165.
Forcada, Mikel. 2017. Making sense of neural machine translation. Translation
Spaces 6(2). 291–309.
Goodfellow, Ian, Yoshua Bengio & Aaron Courville. 2016. Deep learning. Cam-
bridge, MA: MIT Press.
Hornik, Kurt. 1991. Approximation capabilities of multilayer feedforward net-
works. Neural Networks 4(2). 251–257.
Ko, Wei-Jen, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Na-
man Goyal, Francisco Guzmán, Pascale Fung, Philipp Koehn & Mona Diab.
2021. Adapting high-resource NMT models to translate low-resource related
languages without parallel data. In Proceedings of the 59th annual meeting of
the Association for Computational Linguistics and the 11th International Joint
Conference on Natural Language Processing, 802–812.
Kudo, Taku & John Richardson. 2018. SentencePiece: A simple and language in-
dependent subword tokenizer and detokenizer for neural text processing. In
Proceedings of the 2018 Conference on Empirical Methods in Natural Language
Processing: System Demonstrations, 66–71. Brussels, Belgium: Association for
Computational Linguistics.
Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg Corrado & Jeffrey Dean. 2013.
Distributed representations of words and phrases and their compositionality.
In Advances in Neural Information Processing Systems 30, 3111–3119.
Papineni, Kishore, Salim Roukos, Todd Ward & Wei-Jing Zhu. 2002. BLEU: A
method for automatic evaluation of machine translation. In Proceedings of the
40th Annual Meeting of the Association for Computational Linguistics, 311–318.
163
Juan Antonio Pérez-Ortiz, Mikel L. Forcada & Felipe Sánchez-Martínez
164
Chapter 8
Custom machine translation
Gema Ramírez-Sánchez
Prompsit Language Engineering
This chapter gives an overview of the theoretical and practical implications of cus-
tomizing machine translation (MT) to make it fit for a particular purpose. The chap-
ter is written for readers who have just a basic knowledge of MT, but experts who
are seeking new ways of explaining MT to non-experts may also find it useful. The
MT paradigm assumed in the chapter is that of neural MT.
1 Introduction
1.1 Generic machine translation
Most casual users of machine translation (MT) undoubtedly rely on generic MT,
that is, MT based on engines trained to cover a wide range of topics, styles and
genres, and not specialized in any particular domain.
While generic engines may be perfectly suitable for general-purpose usage,
they may become less useful for texts that use a narrow range of vocabulary, or
have very particular, characteristic styles, or are constrained by the conventions
of a particular genre. This typically applies to texts associated with highly spe-
cialized domains such as law or medicine, but such constraints are also a feature
of texts that we encounter in every-day life. Recipes, for example, have typical
structures and vocabulary that differentiate them from other “every-day” texts
like consumer guides. You hardly ever find questions in recipes, but these are
frequent in consumer guides. Both types of text are often translated into other
languages, either for casual use (think of a search engine translating something
like “how can I make chocolate cookies?”, or “what type of light bulb is recom-
mended to save energy?”), or for professional use (think of a publisher translat-
ing a recipe book, or a manufacturer translating technical specifications for a
Don’t worry if you are not fluent in Spanish, French or Italian. Just do a sim-
ilar test yourself by translating apple crumble into a language you know using
your favourite online MT service. You’ll probably see translations related to ‘col-
lapsing apples’ or ‘apples that fall apart’ like we see in some of the examples in
Table 1. Other translations, namely crumble aux pommes and Crumble di mele, are
good (which is why we have glossed them in Table 1 simply as ‘apple crumble’).
How the MT engine copes depends on the data used to train the engines. Spe-
cial steps may also be taken to feed engines with the correct terminology during
training or as a post-translation step. But, without built-in treatment of special-
ized terminology, what we get from system MT2 is fairly typical of what we can
expect from generic MT.
166
8 Custom machine translation
Now let’s dive into the intriguing world of light bulbs. A huge variety of spe-
cialized information is available to consumers eager to learn about the many
types of light bulbs on the market. So, imagine you are a non-native speaker of
English living in an English-speaking country; a light goes out, and your neigh-
bour offers to give you a twisted fluorescent lamp. You offer your best smile in
exchange, as you are not sure what the neighbour means exactly. Using your
phone, you check the translations provided by some generic MT systems. They
provide the translations reproduced in Table 2.
Table 2: Machine Translation of a type of light bulb: twisted fluorescent
lamp.
After reading this, you expect your neighbour to give you a funny-shaped stan-
dard lamp or table lamp. Where are you getting this idea from? Oh, ambiguity:
none of the engines provides a word meaning light bulb as a translation for lamp.
All of them go for the other meaning of lamp, where the word stands for the
whole piece of lighting equipment.
Generic MT got it wrong, but a custom MT engine should be able to get it right.
But what is custom MT? The next section should give you an idea.
167
Gema Ramírez-Sánchez
Anything that helps your company improve communication with its internal
staff, train car salespeople, or convince buyers, is deemed a key activity, and
in a multilingual environment like this one, MT can be very helpful. Your com-
pany thus uses MT to produce the first draft of nearly all its translations. Review-
ers, known as post-editors (see O’Brien 2022 [this volume]), then improve these
drafts.
Your company starts out on its MT journey using a generic MT system and im-
proving the output through post-editing. Soon, the post-editors begin to realize
that they need to fix the same terminology, genre and style mistakes over and
over again: this is not very appealing or efficient. Your company then remem-
bers that it has been producing translations for decades, and wonders whether it
could use these existing translations to somehow improve the process.
The answer is yes. But how? First, by including its own past translations as
training data, your company allows the MT engine to learn from them. That is,
it uses its own training data to create a custom MT engine. The custom engine
produces draft translations which are much closer to the company’s past trans-
lations and have far fewer errors in terminology and style, and the post-editors
are happier.
But are things really as simple as that? Well, yes, but only if you have a suffi-
cient amount of data (millions of translated sentences) which is in the right for-
mat (aligned parallel data; see Kenny 2022 [this volume]), is internally consistent
(otherwise be prepared for inconsistent output), and is in the desired language
pair or pairs. You also need engineers or external providers to train the system
and integrate it into the company’s translation workflow, as well as the right
hardware and software. All this just to start with. Then you need a retraining
plan if you want to take advantage of the next translations that will be produced:
this can be on-the-fly if you work with adaptive MT, every six hours if you have
crazy production numbers, every six months or once a year if you just want to
keep the system up-to-date and consistent.
So maybe “simple” is not the word, and you might ask whether all the effort
will be worth it? Let’s set reasonable expectations.
168
8 Custom machine translation
Given sufficient effort and the right resources, custom MT is capable of out-
putting text without the kinds of error in Table 3. Among the resources required
are suitable human resources, which I address in the next section.
169
Gema Ramírez-Sánchez
170
8 Custom machine translation
2
For the purposes of this analogy, each different stamp represents a different domain, and texts
with no stamp can be considered as non-domain specific.
171
Gema Ramírez-Sánchez
You might start with the list of words and observe further relationships between
other words, phrases and longer chunks. Or you could use the fact that texts can
be classified into those with stamps and those without stamps. You could, for
example, start grouping together texts carrying similar stamps. Or you could use
all the texts at the same time. A myriad of possibilities is open to you to start
learning from data about translation between L1 and L2.
At this point, when an MT system is learning how to translate, it is in the
same situation as you are on this mission: you both have texts, bilingual (and
monolingual), maybe also terminology lists, but nothing else. These are the only
sources to learn from, but there are different ways of carrying out the learning
process.
Back on your newly discovered planet, you first open your mind and try to
learn from what you can observe in the bilingual L1-L2 texts. You use the lists
already compiled to check if your assumptions are right and then try to build on
these assumptions by forming new assumptions. You soon move from words to
longer chunks. At this point you don’t pay attention to whether a text is stamped
or not; you try to use all resources together as a whole.
In a similar way, to build a first MT system, one usually starts by concatenating,
or stringing together, all bilingual data, regardless of the different domains they
come from, and by performing initial training using the default software settings.
After a first effort, you start getting messages from the L1 speaker and translat-
ing them into the L2. Then you show your translated messages to the L2 speaker
to validate them. And then you repeat the process working the other way round.
You keep improving your knowledge as you interpret the expressions on the L1
and L2 speakers’ faces. Sometimes they laugh, but most of the time they nod
their heads, and sometimes they even look as if they get it. You learn from their
feedback and keep going.
In MT development, evaluation is not usually based on human (or extrater-
restrial) assessment. Rather, we use automatic metrics to compute quality scores
based on a comparison of the machine’s output with translations already pro-
duced by professional translators (see Rossi & Carré 2022 [this volume]). In most
cases, the more similar the machine output is to the human translation, the bet-
ter it is deemed to be. If the automatic metrics suggest that things are OK, and a
quick inspection of the output suggests that it does not present any major issues,
the system can be considered as a functional baseline. Otherwise, we keep train-
ing, maybe adding some pre-processing or post-processing. After each round of
training, we check our automatic metrics. When the scores are as good as we
think we can get them, we stop training.
172
8 Custom machine translation
Back on the newly discovered planet, as you progress in your learning pro-
cess, you discover that there is more than one translation for some words in the
same language combination, but that correspondents are consistent across texts
marked with the same stamp. So you decide to separate the texts by their stamps
and to compile specific correspondences in separate lists. You start further in-
specting non-stamped texts and then move to stamped ones. Stamped texts look
a bit different to non-stamped ones: for example, sentences tend to be very long
in some stamped texts while in non-stamped texts they are very short.
Given this situation, depending on how much data we have and the final goal
of the system, we could train an MT engine using only texts that share the same
stamp (and where the stamp represents a domain). Our in-domain system could
use both generic and domain-specific texts or just domain-specific ones. And we
will definitely make the most of state-of-the-art MT techniques to make the sys-
tem as domain-aware as possible. This is exactly what customization is about:
playing with data and techniques. In what follows we explain each of these ap-
proaches in basic terms. A more comprehensive survey of domain adaptation in
neural MT is provided by Saunders (2021).
173
Gema Ramírez-Sánchez
data needs to represent a generous proportion of the whole training data set,
otherwise our system will not be able to learn how to produce in-domain trans-
lations.
The unspecific measure “a generous proportion” is used here on purpose as
we know that it is very rare to have enough in-domain data to train a system.
After all, we will need at least several million sentence pairs; maybe less than for
generic MT, but still a lot. So, we normally end up mixing the available in-domain
data with generic or out-of-domain data.
Depending on the language combination, when adding the available in-
domain data to the out-of-domain data as the first step in customizing an MT
system, we are usually faced with one of two very different scenarios: we either
have too much data or too little data. When it comes to data, size matters.
174
8 Custom machine translation
Experience shows that not only bilingual data, but also monolingual and multi-
lingual data, and generic, in-domain and multi-domain data, have all proven use-
ful in helping MT systems to learn (Saunders 2021). What is more, tiny amounts
of data are starting to be taken into account in adaptive or incremental MT sce-
narios (see O’Brien 2022 [this volume]). The landscape is changing fast but one
thing is certain: provided that there is some data, there is a chance for learning,
and MT will make the most of it.
175
Gema Ramírez-Sánchez
this problem consists in filtering out noise using a mix of patterns and rules
to remove obvious noise, scoring sentences for quality, and classifying them to
discriminate between high-quality and low-quality content. It also includes the
removal of duplicates (Khayrallah & Koehn 2018).
176
8 Custom machine translation
help in tailoring the output towards more in-domain-like language. It is also pos-
sible to use domain controllers or discriminators to label training data at word,
sentence or even embedding level. This technique consists in identifying pre-
cisely what in the generic data is closer to or further from the in-domain data
and use this information during training.
4 Customization in practice
Theory and practice are frequently two sides of the same coin. This section gives
very practical details on customizing a neural MT engine. It is aimed at beginners
and does not assume advanced technical knowledge.
4
At the time of writing, there are more than 40 providers offering MT services and around 20
provide some customization options. Source: https://inten.to/mt-landscape/, last accessed 26
June 2022.
177
Gema Ramírez-Sánchez
All this can be done in a semi-automatic way, where a real person takes care
of the data and the processes involved, or in an fully automatic way, where cus-
tomization happens without further human intervention.
There are also MT testing environments designed for teaching language pro-
fessionals how MT works. These are used in translation technology classrooms
or in professional environments as training tools. They usually offer customiza-
tion options to see what happens when a system is trained with more generic or
in-domain data. A good example of such an environment is MutNMT.5
• Language combination
• Domain
• Available data
• Deadline
178
8 Custom machine translation
The purpose of most neural MT systems is to produce the best raw output
possible, but additional requirements usually need to be met. Before you opt for
a particular service provider you might need to consider whether their language
models can be accessed remotely or whether you actually need access to them on
your own premises. Will the system be used online with concurrent users or as a
batch one-at-at-time queued process? Will the system translate text strings? Or
does it need to support translation using different file formats? Will it be accessed
through an API, web app, or a connector to a third-party tool? And so on.
Deployment of custom MT might require a trade off between quality and meet-
ing a delivery deadline. The best possible system may need more training days
than you can afford.
Finally, hardware is also a key component, both for training and for later use
of a system in production. Depending on other factors, you may need a service
that is available 24/7, and uses several GPUs/CPUs at the same time.
179
Gema Ramírez-Sánchez
• Overlap between the sentences in these three sets must be avoided at all
costs. Indeed, if possible, sentences from different sources should be used
in each set to guarantee their balance and independence.
8
Free alignment tools include LF Aligner (https://sourceforge.net/projects/aligner/, last ac-
cessed 26 June 2022); while well-known paid-for alignment tools include those provided with
translation memory tools (see Kenny 2022 [this volume]).
9
I am assuming here sentences as the training unit, not documents.
180
8 Custom machine translation
• Training sets can vary in size from several thousand to several million sen-
tences, but validation and test sets do not normally exceed 5,000 sentences.
• Training data may contain generic and custom data, but validation and test
data should be as in-domain as possible to test the suitability of the trained
model for its intended purpose.
• When the training set contains both generic and in-domain data, the pro-
portion of in-domain data needs to be as large as possible, otherwise the
model will take most of its knowledge from the generic data.
Thus far, nothing we have mentioned is peculiar to custom engines; all this
preparation applies equally to generic and custom MT. The next steps, related
to pre-processing, also apply in both scenarios but may vary for particular lan-
guages or language combinations:
• Text may need to be truecased: we may want our training process to cap-
ture the fact that a word spelled with an initial capital letter at the begin-
ning of sentences in our training data (for example, The) is the same word
as that spelt all in lower case (in this case, the) in other sentence positions in
the data. We thus use truecasers to convert all words except proper names
(in languages like English) to lower case.10 Truecasing only applies to lan-
guages that distinguish uppercase and lowercase so it is not applicable to
Chinese, Arabic, Hebrew and many others.
10
Note that all nouns in German begin with upper case, and, like proper nouns in English, these
should not be truecased.
181
Gema Ramírez-Sánchez
182
8 Custom machine translation
All of these parameters usually come with default values that developers have
set after optimizing the training process for a particular environment.
Once parameters have been set – or the default parameters have been accepted
– training proceeds as follows: at each training step, a batch of training data is
fed into the neural network, the output for each sentence in the batch is com-
puted, the error loss is computed, weights are updated, and it all starts again!
After a predetermined number of training steps (set by the validation frequency),
the engine’s performance is evaluated, and then training resumes. When further
training fails to improve the engine’s performance, or the performance starts to
degrade, the training stops. There it is, our model!
14
Given the large amounts of data used in NMT, the use of epochs to measure the duration of
the training can be impractical. Rather than using epochs, you can use the number of steps, in
relation to a particular batch size, to help you measure duration independently of the model,
language pair or amount of data.
15
Perplexity in natural language processing, and more specifically in MT, measures how uncer-
tain a translation model is about predicting the next word when translating. A low perplexity
is obtained when the translation model assigns a high probability to each word/token in a
given target sentence. For more information on BLEU and chrF1, see Rossi & Carré (2022 [this
volume]).
183
Gema Ramírez-Sánchez
184
8 Custom machine translation
Finally, if you have gone to the trouble of creating a custom system, with all
the excitement and pain that this might entail, you might want to compare the
output of your system with that of a generic system, using any of the relevant
testing options above. If your custom system outperforms the generic system,
then it is a success. Well done! Otherwise, try to keep having fun!
5 Conclusion
This chapter has provided a brief overview of the customization of MT. Hav-
ing differentiated between custom MT and generic MT, the chapter stressed the
importance of managing expectations when it comes to customization, before
introducing the professional roles involved in custom NMT, and asking where
MT sits in the translation workflow. Customization through both data and tech-
niques was discussed, and analogies with real-life learning processes were sug-
gested. The chapter concluded with a practical section on tools, customization
strategy, data compilation and preparation, training and – finally – testing, in a
bid to help readers get hands-on experience of custom MT.
6 Acknowledgments
I would like to thank Dorothy Kenny, Jaume Zaragoza Bernabeu, Carmen Iniesta
López, Amelia Arenas Olivares and Maite Heredia Arribas for their time and
suggestions on how to improve this text. Thanks also to Reinhard Rapp and Felix
Kopecky for their thorough review of this chapter. Any remaining errors are
mine alone.
References
Denkowski, Michael & Alon Lavie. 2014. Meteor universal: Language specific
translation evaluation for any target language. In Proceedings of the ninth work-
shop on Statistical Machine Translation, 376–380. Baltimore: Association for
Computational Linguistics. DOI: 10.3115/v1/W14-3348.
Kenny, Dorothy. 2022. Human and machine translation. In Dorothy Kenny (ed.),
Machine translation for everyone: Empowering users in the age of artificial intel-
ligence, 23–49. Berlin: Language Science Press. DOI: 10.5281/zenodo.6759976.
Khayrallah, Huda & Philipp Koehn. 2018. On the impact of various types of noise
on neural machine translation. In Proceedings of the 2nd workshop on neural
machine translation and generation, 74–83. Melbourne: Association for Com-
putational Linguistics. https://aclanthology.org/W18-2709.pdf.
185
Gema Ramírez-Sánchez
186
Chapter 9
Machine translation for language
learners
Alice Carréa , Dorothy Kennyb , Caroline Rossia , Pilar Sánchez-
Gijónc & Olga Torres-Hostenchc
a Université Grenoble-Alpes b Dublin City University c Universitat Autònoma de
Barcelona
Machine Translation (MT) has been controversial in second and foreign language
learning, but the strategic integration of MT might be beneficial to language learn-
ing in certain contexts. In this chapter we discuss the conditions in which MT can
be useful in language learning, set out digital alternatives to MT, and provide ex-
amples of how MT can support language learners.
1 Introduction
Machine translation (MT) has been controversial in second and foreign language
learning,1 with some commentators arguing that it can encourage plagiarism,
promote errors or deflect learners from what they should be doing. In some cases,
however, MT has been found to help students complete certain tasks, and there
appears to be merit in considering MT as just one among many digital resources
that contemporary language learners can use. The successful integration of MT
into language learning requires us to understand, even at a basic level: how the
technology works, how we can judge the quality of its outputs, how those outputs
can be improved through intervention either before or after the fact of transla-
tion (through pre-editing or post-editing), and what the ethical issues in using
1
Note that we use the generic terms language learning and language learner in this chapter to
cover instances of foreign language learning and second and subsequent language learning. If
a student’s first language is their L1, then the language learning to which we refer corresponds
to their learning of an L2, L3 or Ln.
Alice Carré, Dorothy Kenny, Caroline Rossi, Pilar Sánchez-Gijón & Olga
Torres-Hostench. 2022. Machine translation for language learners. In
Dorothy Kenny (ed.), Machine translation for everyone: Empowering users in
the age of artificial intelligence, 187–207. Berlin: Language Science Press. DOI:
10.5281/zenodo.6760024
A. Carré, D. Kenny, C. Rossi, P. Sánchez Gijón & O. Torres-Hostench
MT are, among other factors. These factors, which are often subsumed under
the heading of MT literacy (Bowker & Ciro 2019) have been covered in depth in
Chapters 2 to 6 of this book. Torres-Hostench (2022 [this volume]), meanwhile,
presented compelling arguments as to why MT needs to be considered as a vital
building block of multilingual societies alongside language learning. What has
been missing so far is a deeper engagement with the use of MT within language
learning. In this chapter we aim to complete the jigsaw by addressing precisely
this issue. We start by looking at the role of translation in language learning,
and then ask whether it is acceptable to use MT for this purpose, and what ben-
efits can be gained by doing so. We go on to suggest contexts in which language
learners should or should not use MT, depending on a number of contextual pa-
rameters, and given the other, often more appropriate digital resources available
to them. Finally, we give practical examples of how MT can be used in language
learning contexts.
188
9 Machine translation for language learners
In other words, students will get caught out not because their writing is riddled
with errors, but because it is too good for their level. A beginner learner of French
who produces a subjunctive verb form that would not normally be encountered
until they had reached an advanced stage, for example, might thus be suspected
of using MT.
But whether someone is cheating or not depends not on the technology they
are using, but on the rules of the game. If learners are forbidden from using MT
in their L2 writing, but nonetheless use the technology surreptitiously, then that
is cheating. Even if they are not expressly forbidden from using MT, but use
it without letting the teacher know, and with the intention of passing the MT
output off as their own writing, then this is still a dishonest action that is carried
out to gain some kind of advantage. Indeed, the presentation of “someone else’s
words” as one’s own belongs to the category of cheating known as plagiarism,
189
A. Carré, D. Kenny, C. Rossi, P. Sánchez Gijón & O. Torres-Hostench
a topic that is addressed by Mundt & Groves (2016) in the context of MT use
in language learning. A number of studies (e.g. Correa 2011, Clifford & Munné
2013, Ducar & Schocket 2018) show, however, that attitudes to the use of MT in
language learning can differ between learners and teachers, or depending on the
extent of the use of MT, or a host of other issues, and so the situation may not
be as clear cut as it first seems.
If you are learning a language in a formal setting, the best advice is to talk to
your teacher, to make sure you understand what does and does not constitute
cheating in your particular circumstances. If you are a language teacher, the best
advice is to talk to your students to ensure that they know what is expected of
them. Either way, the wish to avoid cheating is just one thing that needs to be
taken into consideration when deciding whether or not it is acceptable to use MT
in a language learning assignment. Other considerations are listed under “situa-
tional parameters” in §3 of this chapter, and in Moorkens (2022 [this volume])
on ethics. For now, we limit ourselves to a discussion of the nature of the “ad-
vantage” that might be gained in using MT, with or without the approval of a
teacher.
190
9 Machine translation for language learners
Translate. He found that use of MT was associated with higher lexical diversity,
and hence better performance, as long as students continued to have access to
the tool, but once access was removed, the effect vanished. Again, the benefit
bestowed by use of MT seemed dependent on the continued availability of the
tool.
So does this mean that it is not worthwhile using MT in language learning?
Not quite. In both studies, the use of MT did not harm students in the long run;
there was simply no difference between students who used MT and students
who did not, once the tool was no longer accessible. In the short term, however,
the students who used MT did better than the others. So whether you benefit or
not from MT use appears to depend on whether you take a short or long term
view, and whether you focus on a particular task as an end it itself or on your
development as a language learner.
Another lesson from O’Neill’s (2019) study is that – in the short term at least –
training matters. Learners who are trained, even briefly, in how MT works, write
better compositions than those with no training.
But it should be noted that individual studies can produce seemingly conflict-
ing results. Fredholm (2015), for example, found that Swedish pupils who used
FOMT in their written compositions in Spanish made fewer mistakes in spelling
and article/noun/adjective agreement, but more mistakes in syntax and in verb
conjugation, than pupils who did not use FOMT.
Other studies are interested in learners’ metalinguistic awareness, defined as:
191
A. Carré, D. Kenny, C. Rossi, P. Sánchez Gijón & O. Torres-Hostench
Enkin & Mejías-Bikandi (2016) propose (but do not test) exercises in which stu-
dents are presented with machine translations into Spanish of sentences involv-
ing “structures of interest” in English, and where contrastive differences mean
that MT traditionally has not been very successful. This is the case, for exam-
ple, with non-finite subordinate clauses that are best translated into Spanish us-
ing finite subordinate clauses. The idea is that students can reflect on where the
machine goes wrong, thus honing their own metalinguistic, and especially con-
trastive awareness. The authors note, however, that as MT improves, “materials
may need to be updated” (Enkin & Mejías-Bikandi 2016: 145). It is probably fair
to say, however, that neural MT engines for language pairs like English-Spanish
have already reached such a level of quality that it is no longer reasonable to
expect them to translate any given structure of interest incorrectly, as a matter
of course, and that the kind of exercise envisaged by these authors needs to be
rethought, so that students are encouraged to reflect on the successes, rather than
the failures, of MT.
The same observation can be made about studies that rely on learners correct-
ing errors in MT output. Not only can exposure to “bad models” (Niño 2009) be
controversial in language learning, but it might also be increasingly difficult to
spot errors in contemporary neural MT in the first place (Castilho et al. 2017,
Loock & Léchauguette 2021), making post-editing type tasks (see O’Brien 2022
[this volume]) less suitable for use in certain language learning contexts than
was previously the case (cf. Zhang & Torres-Hostench 2019).2
Thue Vold (2018) reports on another study on metalinguistic awareness. This
time learners of French as an L3 in an upper secondary school in Norway had to
read two different machine-translated versions of the same text (one translated
by Google Translate, the other by Microsoft’s Bing Translator), decide which ma-
chine translated version was better and explain why. The exact proficiency level
of the students was not ascertained, although the author (Thue Vold 2018: 73)
intimates that it was unlikely to be above B1 on the Common European Frame-
work of Reference for Languages.3 Thue Vold concludes that while the use of MT
texts to develop learners’ metalinguistic awareness has “considerable potential”,
“training, scaffolding techniques and guidance from the teacher are of paramount
importance” (ibid.: 89) as, left to their own devices, learners may not explore
fruitful avenues of analysis, and their group conversation may even reinforce
misconceptions about language (ibid.).
2
Having said that, recent studies, like that conducted by Loock & Léchauguette (2021), may
be more interested in developing MT literacy – rather than metalinguistic awareness per se
– among language learners, and teacher-guided error analysis of MT output may serve this
purpose well.
3
https://www.coe.int/en/web/common-european-framework-reference-languages
192
9 Machine translation for language learners
• Language learners use MT and rather than trying to outlaw its use, it is
better to take a nuanced approach, based on an understanding of where
MT can be more or less helpful, depending, perhaps, on the extent and
context of use.
• Language learners make better use of MT when they have received appro-
priate training.
• Language learners can generally benefit more from MT if they already have
reasonably good proficiency in the foreign language (O’Neill 2012, Resende
& Way 2021).
193
A. Carré, D. Kenny, C. Rossi, P. Sánchez Gijón & O. Torres-Hostench
What is notable here is that the input does not have a capital letter at the
beginning of the sentence and there is no full stop at the end. If these features of
the standard written language are reinstated, however, the output also changes
– for the better – as shown in Figure 2.
194
9 Machine translation for language learners
Similar issues have been observed when students copy-paste text into a FOMT
window, not realizing that they may have done so in such a way that each line
has a line break at the end of it, and what the FOMT thus sees is a series of in-
dependent lines, each of which will be translated independently of the others.
State-of-the-art MT engines are trained to translate sentences. They work best
when they can actually identify and translate full sentences. It is therefore im-
portant to make sure that you don’t “feed” text full of stray line breaks to the
machine.
• Just as the outputs of different MT engines or systems can be fruitfully
compared with each other, the usefulness of MT in language learning can
be fruitfully compared with the usefulness of competing or complemen-
tary tools, such as corpus tools or online dictionaries. The next section
elaborates on this point.
195
A. Carré, D. Kenny, C. Rossi, P. Sánchez Gijón & O. Torres-Hostench
based (as explained in Chapters 2 and 7), they will produce poorer results when
too little data is available to train the system. In the examples given in this Sec-
tion, we use the FR<>EN language pair (looking at translations from French into
English as well as from English into French), for which current MT solutions
often produce good enough results, but we certainly encourage readers to find
examples in their language pairs and compare them with ours.
Genre also makes a difference. You may find for instance that FOMT is better
at translating essays than it is at translating poems or the lyrics of your favourite
song. This may be because the data used to train the MT system are more similar
to the former, and translated songs and poetry are probably quite rare in the train-
ing data. Poem and song translation are also particularly demanding: translated
poems and songs may have to be recitable or singable. They may require partic-
ular rhyming schemes or metres. Although machines can be trained to write and
even to translate poetry (Van de Cruys 2018, 2019, 2020), general-purpose FOMT
might not be up to the task. It is still an interesting exercise to try it out, however:
take a popular song, poem or nursery rhyme in either your L1 or L2. Find a good
human translation of it,5 one that tries to create pleasing rhymes and rhythm in
the target language. Now run the original through a FOMT engine and compare
the MT with the human translation. The results are likely to encourage you to
reflect on what MT does well, and what human translators do wonderfully.
196
9 Machine translation for language learners
197
A. Carré, D. Kenny, C. Rossi, P. Sánchez Gijón & O. Torres-Hostench
198
9 Machine translation for language learners
199
A. Carré, D. Kenny, C. Rossi, P. Sánchez Gijón & O. Torres-Hostench
• Dictionary entries are based on a single word, while you can get an NMT
output for as much text as you like. NMT engines are far less useful than
dictionaries because their output for an isolated word is often unreliable.9
• Dictionaries provide you with definitions, which may be the only reliable
way to make sure you have understood the meaning of a word.
• Dictionary entries are based on human intuition and (most of the time)
they are designed and/or checked by lexicographers.
• NMT outputs are based on corpora, but they are not exact quotes from the
corpora that have been used for training (as explained in Pérez-Ortiz et
al. 2022 [this volume]). The next section addresses this difference in more
detail.
200
9 Machine translation for language learners
Table 3: Sample NMT output for nous allons faire le nécessaire (MT by
https://www.bing.com/translator, 2021-11-01)
Table 3 presents a comparison of two queries, the first one being shorter and
more ambiguous than the second. It shows that NMT engines are able to adjust to
sentence contexts: linked with the explicit mention of an objective (pour ratifier
l’accord) a different construction is used in the English NMT output, where faire
le nécessaire becomes take the necessary steps to.
Overall, it makes much more sense to use NMT with full sentences (see Kenny
2022: §7 [this volume]) or texts than with isolated words or phrases. When look-
ing for a word, a collocation or a phrase, it might be more efficient and reliable
to use a dictionary and/or a corpus, since you will get controlled results. Parallel
201
A. Carré, D. Kenny, C. Rossi, P. Sánchez Gijón & O. Torres-Hostench
corpora give you access to a series of translation choices whose context is usu-
ally easy to retrieve. MT, on the other hand, outputs results that are based on
complex computations from training data that are not always accessible (they
are typically hidden in FOMT interfaces). This can make it difficult for users to
determine whether a proposed translation is indeed reliable.
6 Error analysis
One of the key skills that is needed to leverage MT for the purposes of second or
foreign language learning is a keen awareness of errors. Various activities might
be set forth in order to develop this skill, but because proposals are still scarce,
we provide readers with one commented example in what follows.10
Based on a text and its MT, learners make a list of the types of errors they are
able to detect and correct in the language pair in question. They submit the list to
their teacher and then receive their teacher’s feedback and a second list contain-
ing all the errors they hadn’t noticed, with additional explanations, suggestions
for further improvements and helpful examples. This exercise will be difficult
if the target (machine translated) text is in the L2, and we suggest that teach-
ers might start with translation into the mother tongue. While such tasks have
been excluded from the language classroom for a long time, as indicated in the
earlier part of this chapter, recent proposals integrating translation into situated
tasks have been made, with a view to turning the learner into a “self-reflective,
interculturally competent and responsible meaning maker in our increasingly
multilingual world” (Laviosa 2014: 105).
Table 4 contains an example of an NMT output for a short text translated
from English into French.11 Errors in the NMT output are highlighted in bold
and commented on below.12
10
Further recent ideas on the integration of MT into language teaching and learning can be found
in Vinall & Hellmich (2022).
11
The text is taken from a textbook for French learners of English (Joyeux 2019:22). In order to
turn this activity into a situated task, learners could be asked to provide a good translation to
a French person with virtually no knowledge of English (e.g. a visitor to the class on a special
occasion). They would need to receive minimal information about MT and about the need to
correct the output that has been provided to them.
12
Error analyses of MT output generally depend on error typologies, which list various types of
problems that can be found in MT output. These usually incorporate accuracy errors (e.g. the
meaning of the target segment is not consistent with that of the source segment) and errors
that affect the fluency or well-formedness of the target segment (e.g. errors in grammatical
agreement, word order, collocation, etc.). For more information, see Rossi & Carré (2022 [this
volume]) on MT evaluation and O’Brien (2022 [this volume]) on post-editing.
202
9 Machine translation for language learners
Errors include overly literal translations like une journée typique de ma vie. The
plural would work best in French and typique needs rephrasing: mes journées de
lycéen (literally: ‘my days as a pupil’) would be a good solution. As you may note,
literal translations are all the more inappropriate as the expression a typical day
is more or less fixed in the source language. For idioms like to give you an idea,
you will need to find an idiomatic expression in the target language: something
like pour vous en donner un aperçu.
Some of the errors are more linked to grammar and language use. Clitic pro-
nouns like en would be needed in the French text, and we could for instance
improve donc j’arrive tôt by turning it into donc j’y arrive en avance (‘so I ar-
rive in advance’). On the other hand, possessives are used in English even when
possession is implicitly retrieved from the rest of the text, in which case the def-
inite is preferred in French (je quitte la maison ‘I leave the house’ rather than
‘my house’). Language use also concerns the lexicon, and while it is common in
English to refer to manner of motion (a 15-minute walk to school), French is usu-
203
A. Carré, D. Kenny, C. Rossi, P. Sánchez Gijón & O. Torres-Hostench
ally more neutral (15 minutes de trajet pour l’école ‘15 minutes of journey for the
school’) with precisions added only if necessary (e.g. à pied ‘on foot’).
There are many more possible examples, but the above will hopefully be
enough to show that although the French NMT output looks good enough, with
no major grammatical or lexical mistakes, there is still a lot of room for improve-
ment. Finding out what learners can and cannot correct will certainly be illumi-
nating for teachers (see Loock & Léchauguette (2021) on this point).
We encourage readers to get NMT outputs for their own language pair, trans-
lating in the first instance into their L1. Activities including NMT outputs in an
L2 in situated tasks can be used at a later stage of language learning, especially
if the task involves detecting and fixing errors in the output.
7 Conclusions
In this chapter we have presented some of the main findings of research to date
into the use of MT in language learning, focusing on more recent sources that
take into account the progress made in MT since the arrival on the scene of
NMT. We have also offered some basic tips on using MT in language learning,
before building on the pragmatic approach to quality evaluation presented in
Rossi & Carré (2022 [this volume]), focusing on what it implies for second and
foreign language learners. Unlike specialized translators, who may not be given a
choice about the tools they use in current translation scenarios, language learners
have choices, but they first need to decide whether and when to use MT. To
this end, we presented a list of situation-based parameters that can help them
make this decision. We also contrasted MT with complementary online tools
such as dictionaries and corpora, stressing the relative merits of each. Finally, we
proposed activities to harness the potential of NMT and include it in the second
or foreign language classroom.
References
Bowker, Lynne. 2020. Machine translation literacy instruction for international
business students and business English instructors. Journal of Business & Fi-
nance Librarianship 25(1-2). 25–43. DOI: 10.1080/0896.1794739.
Bowker, Lynne & Jairo Buitrago Ciro. 2019. Machine translation and global re-
search. Bingley: Emerald Publishing.
Cambridge University Press. 2020. Cambridge advanced learner’s dictionary and
thesaurus. https://dictionary.cambridge.org/dictionary/english/.
204
9 Machine translation for language learners
Castilho, Sheila, Joss Moorkens, Federico Gaspari, Iacer Calixto, John Tinsley &
Andy Way. 2017. Is neural machine translation the new state of the art? The
Prague Bulletin of Mathematical Linguistics 108. 109–120. DOI: 10.1515/pralin-
2017-0013.
Chung, Eun Seon & Soojin Ahn. 2021. The effect of using machine translation
on linguistic features in L2 writing across proficiency levels and text genres.
Computer Assisted Language Learning. DOI: 10.1080/09588221.2020.1871029.
Clifford, Lisa Merschel, Joan & Joan Munné. 2013. Surveying the landscape:
What is the role of machine translation in language learning? @tic. Revista
D’innovació Educativa 10. 108–121.
Cook, Guy. 2010. Translation in language teaching. Oxford: Oxford University
Press.
Correa, Maite. 2011. Academic dishonesty in the second language classroom: In-
structors’ perspectives. Modern Journal of Language Teaching Methods 1(1). 65–
79.
Dorst, Lettie, Susana Valdez & Heather Bouman. 2022. Machine translation in
the multilingual classroom. How, when and why do humanities students at a
Dutch university use machine translation? Translation and Translanguaging in
Multilingual Contexts 8(1). 49–66. DOI: 10.1075/ttmc.00080.dor.
Ducar, Cynthia & Deborah Houk Schocket. 2018. Machine translation and the
l2 classroom: Pedagogical solutions for making peace with google translate.
Foreign Language Annals 51. 779–795.
Enkin, Elizabeth & Errapel Mejías-Bikandi. 2016. Using online translators in the
second language classroom: Ideas for advanced-level Spanish. LACLIL 9(1).
138–158. DOI: 10.5294/laclil.2016.9.1.6.
Fredholm, Kent. 2015. Online translation use in Spanish as a foreign language
essay writing: Effects on fluency, complexity and accuracy. Revista Nebrija de
Lingüística Aplicada a la Enseñanza de las Lenguas 18. 7–24.
Fredholm, Kent. 2019. Effects of google translate on lexical diversity: Vocabulary
development among learners of Spanish as a foreign language, vol. 13. 26378/rn-
lael1326300. 98–117. DOI: doi:10.26378/rnlael1326300.
Jolley, Jason R. & Luciane Maimone. 2022. Thirty years of machine translation
in language teaching and learning: A review of the literature. L2 Journal 14(1).
26–44. http://repositories.cdlib.org/uccllt/l2/vol14/iss1/art2.
Joyeux, Maël. 2019. Fireworks, anglais. https://www.lelivrescolaire.fr/.
Kelly, Niamh & Jennifer Bruen. 2017. Using a shared L1 to reduce cognitive over-
load and anxiety levels in the L2 classroom. The Language Learning Journal
45(3). 368–81.
205
A. Carré, D. Kenny, C. Rossi, P. Sánchez Gijón & O. Torres-Hostench
Kenny, Dorothy. 2022. Human and machine translation. In Dorothy Kenny (ed.),
Machine translation for everyone: Empowering users in the age of artificial intel-
ligence, 23–49. Berlin: Language Science Press. DOI: 10.5281/zenodo.6759976.
Laviosa, Sara. 2014. Translation and language education: Pedagogic approaches ex-
plored. London/New York: Routledge.
Lee, Sangmin-Michelle. 2021. The effectiveness of machine translation in for-
eign language education: A systematic review and meta-analysis. Computer
Assisted Language Learning 33(3). 157–175. https://www.tandfonline.com/doi/
full/10.1080/09588221.2021.1901745.
Loock, Rudy & Sophie Léchauguette. 2021. Machine translation literacy and un-
dergraduate students in applied languages: Report on an exploratory study.
Revista Tradumàtica: tecnologies de la traducció 19. 204–225. DOI: 10.5565/rev/
tradumatica.281.
Moorkens, Joss. 2022. Ethics and machine translation. In Dorothy Kenny (ed.),
Machine translation for everyone: Empowering users in the age of artificial intel-
ligence, 121–140. Berlin: Language Science Press. DOI: 10.5281/zenodo.6759984.
Mundt, Klaus & Michael Groves. 2016. A double-edged sword: The merits and the
policy implications of Google Translate in higher education. European Journal
of Higher Education 6(4). 387–401. DOI: 10.8235.2016.1172248.
Niño, Ana. 2009. Machine translation in foreign language learning: Language
learners’ and tutors’ perceptions of its advantages and disadvantages. ReCALL
21(2). 241–258. DOI: 10.1017/S0958344009000172.
Noriega-Sánchez, María, Ángeles Carreres & Lucía Pintado Gutiérrez. 2021. In-
troduction: Translation and plurilingual approaches to language teaching and
learning. Translation and Translanguaging in Multilingual Contexts 7(1). 1–16.
O’Brien, Sharon. 2022. How to deal with errors in machine translation: Post-
editing. In Dorothy Kenny (ed.), Machine translation for everyone: Empower-
ing users in the age of artificial intelligence, 105–120. Berlin: Language Science
Press. DOI: 10.5281/zenodo.6759982.
O’Neill, Errol. M. 2012. The effect of online translators on L2 writing in French.
Unpublished PhD, University of Illinois at Urbana-Champaign. Retrieved from
http://hdl.handle.net/2142/34317.
O’Neill, Errol. M. 2019. Training students to use online translators and dictionar-
ies: The impact on second language writing scores. International Journal of
Research Studies in Language Learning 8(2). 47–65.
Pérez-Ortiz, Juan Antonio, Mikel L. Forcada & Felipe Sánchez-Martínez. 2022.
How neural machine translation works. In Dorothy Kenny (ed.), Machine trans-
lation for everyone: Empowering users in the age of artificial intelligence, 141–164.
Berlin: Language Science Press. DOI: 10.5281/zenodo.6760020.
206
9 Machine translation for language learners
207
Name index
209
Name index
210
Machine translation for everyone
Language learning and translation have always been complementary pillars of multilin-
gualism in the European Union. Both have been affected by the increasing availability
of machine translation (MT): language learners now make use of free online MT to help
them both understand and produce texts in a second language, but there are fears that
uninformed use of the technology could undermine effective language learning. At the
same time, MT is promoted as a technology that will change the face of professional
translation, but the technical opacity of contemporary approaches, and the legal and eth-
ical issues they raise, can make the participation of human translators in contemporary
MT workflows particularly complicated. Against this background, this book attempts
to promote teaching and learning about MT among a broad range of readers, including
language learners, language teachers, trainee translators, translation teachers, and pro-
fessional translators. It presents a rationale for learning about MT, and provides both a
basic introduction to contemporary machine-learning based MT, and a more advanced
discussion of neural MT. It explores the ethical issues that increased use of MT raises,
and provides advice on its application in language learning. It also shows how users
can make the most of MT through pre-editing, post-editing and customization of the
technology.
Technological multilingualism can significantly enhance linguistic capabilities in the EU by leveraging multilingual translation tools that support both individual and societal engagement with multiple languages. This approach allows greater representation of minority languages in EU institutions by enabling communication beyond the 24 official languages . Machine translation can expand the accessibility of diverse languages, helping European citizens engage more easily with non-native languages and cultural environments without fear . Furthermore, universities can use machine translation to promote multilingual research dissemination and create multilingual environments that support international students and encourage linguistic diversity . These tools are crucial in making language variety manageable and operational within the expansive linguistic landscape of the EU .
The European Union defines multilingualism as "the ability of societies, institutions, groups, and individuals to engage, on a regular basis, with more than one language in their day-to-day lives" . This understanding of multilingualism informs its language policies, emphasizing the promotion of language learning and linguistic diversity . EU language policies include initiatives like the "mother tongue plus two" strategy, encouraging citizens to learn at least two foreign languages from an early age . The implications for societal engagement are significant, as this approach aims to foster social cohesion, enhance employability, and support personal development by encouraging language skills and intercultural interactions . Moreover, the EU values linguistic diversity as a core asset, reflected in the provisions allowing EU citizens to communicate with EU institutions in any of the 24 official languages and through support of regional and minority languages .
Machine translation in professional workflows raises several ethical considerations: 1. **Data Privacy and Security:** Sharing documents with machine translation services may expose sensitive information, violating privacy and data protection regulations . Professionals must ensure that data is handled securely and in compliance with legal standards . 2. **Bias and Fairness:** Machine translation systems can inherit biases present in the data they are trained on, leading to outputs that might favor certain gender forms or cultural assumptions, potentially skewing the translation in undesirable ways . This requires vigilance in monitoring and mitigating such biases . 3. **Quality and Accountability:** Errors in machine translation, ranging from minor inaccuracies to serious miscommunications in contexts like healthcare or international diplomacy, underscore the need for careful evaluation and post-editing . Moreover, there's a risk of diminishing perceived value in human translation if machine output is not critically assessed . 4. **Environmental Impact:** The computational resources needed for running machine translation systems can have significant environmental footprints, contributing to broader sustainability concerns . 5. **Labor and Professional Impact:** The integration of machine translation could affect working conditions and job satisfaction among translators, as it might shift focus from creative translation to post-editing, potentially lowering job satisfaction and perceived professionalism . 6. **Ethical Stakeholder Involvement:** Introducing machine translation should involve consultation with all stakeholders, including translators and clients, to ensure transparency and consent regarding workflow changes . Overall, ethical implementation requires awareness, transparency, and adaptation to maintain translation quality and trust in professional contexts.
Pre-editing in NMT systems presents both challenges and opportunities. The lack of systematic errors in NMT makes it difficult to predict translation errors, reducing the immediate necessity of pre-editing. However, pre-editing remains essential for optimizing source texts, especially for complex or incoherent content that might degrade NMT's effectiveness . Opportunities arise from using pre-editing to enhance the quality and translatability of source texts, thus minimizing errors in translated outputs. It ensures cohesion and comprehensibility in both source and target texts, leading to more efficient post-editing processes . Nonetheless, integrating pre-editing into workflows requires translator expertise, which may not be commonplace in the industry .
The EU's "mother tongue plus two" policy aims to foster multilingualism by encouraging citizens to learn at least two foreign languages from an early age, thereby promoting language diversity and social cohesion. It is intended to enhance mobility and employability across member states by equipping individuals with language competences necessary for personal development . However, implementation challenges remain, as English tends to dominate second language learning at the expense of other languages, prompting concerns about whether diversity goals are truly being met . While the policy theoretically supports linguistic diversity, it primarily focuses on a few dominant languages such as English, French, and German, which limits the engagement with less widely spoken languages or regional languages . Despite these challenges, the policy aligns with the EU's broader commitment to preserving linguistic diversity as a cultural asset, recognized in foundational documents such as the Charter of Fundamental Rights .
Incorporating machine translation in EU initiatives can potentially expand linguistic diversity and cultural engagement by increasing the number of languages accessible to citizens and facilitating communication across different linguistic communities. The EU faces the challenge of translating a vast number of possible language combinations due to its 24 official languages, making manual translation of all documents impractical. Using machine translation, particularly systems like eTranslation, can increase productivity and reduce costs while also increasing the number of documents translated into various languages, including minority and regional languages, which may otherwise be ignored . This can lead to a broader representation of linguistic diversity within EU institutions, enabling inclusion of indigenous and immigrant languages . Furthermore, machine translation can enhance cultural engagement by supporting multilingualism as a part of language learning and by promoting access to unfamiliar languages, thus fostering curiosity and reducing fear of language barriers . However, the use of machine translation also comes with challenges such as potential errors and ethical concerns, including data privacy and the potential for bias, which need to be addressed . Overall, by integrating machine translation into EU initiatives, there is potential for enriched cultural engagement and greater reflection of Europe's linguistic diversity.
Machine translation (MT) plays a significant role in supporting multilingualism within the EU by enabling linguistic diversity initiatives and easing communication across diverse languages. MT helps manage the EU's complex language translations given its 24 official languages and 552 possible translation combinations, which would be exceedingly costly and time-consuming if relying solely on human translators . It serves to broaden access to documents, making translations feasible for more languages, including regional and minority languages that might otherwise be neglected . The integration of MT into EU institutions allows for the representation of a wider variety of languages within policy and communication . Additionally, MT facilitates multilingualism in educational contexts, such as universities, by enabling students and staff to engage with materials in multiple languages, thus fostering a truly multilingual environment . However, while MT is valuable, its use must be carefully managed to avoid over-reliance and ensure quality, addressing potential issues such as inaccuracies and biases . Furthermore, MT is incorporated into language learning to assist students in understanding and editing translations, leveraging technology to enhance language competency beyond traditional methods . Overall, the EU has integrated MT into its broader strategy to promote linguistic diversity, supporting initiatives such as the European Language Label and leveraging technological advancements to supplement human language learning, thus maintaining its commitment to multilingualism amid practical challenges .
Neural network generalization plays a crucial role in enhancing machine translation efficiency and accuracy by enabling neural networks to produce correct translations for sentences they have not seen during training. This is achieved through the smoothness of neural networks, allowing similar inputs to yield similar outputs even if they weren't part of the training data . Neural machine translation (NMT) benefits from deep neural architectures, which involve multiple layers of neurons that refine the translation progressively, capturing complex linguistic patterns . It can handle full sentences to navigate linguistic complexities like discontinuous dependencies, resulting in translations that are generally more accurate than those produced by other methods like statistical machine translation . The encoder-decoder architecture of NMT systems, with its ability to process entire sentences, further contributes to this efficiency and precision, making the most of the data through techniques like attention mechanisms . In essence, the ability to generalize effectively allows NMT systems to efficiently extend knowledge from trained data to unseen sentence structures, enhancing overall translation quality .
Successful integration of machine translation (MT) into language learning in the EU involves several strategies. First, learners and educators need to understand how MT works, evaluate the quality of its outputs, and improve these outputs through pre-editing and post-editing processes . It's also essential to address the ethical issues related to MT usage . Secondly, MT can be leveraged to support reading comprehension of complex texts and help develop advanced writing skills in a second language by allowing learners to detect and edit mistakes in machine-generated translations based on their language proficiency . Further, MT should be used as a tool within a broader set of digital resources, fostering a critical and conscious approach to its use . Activities that incorporate MT, such as tasks focused on error detection and correction, could enhance the learning process . Finally, fostering MT literacy, which includes understanding the role of MT in multilingual regimes and avoiding its pitfalls, is crucial for both occasional users and professional translators . By employing these strategies, MT can be a beneficial complement to traditional language learning methods in the EU.
Recurrent neural networks (RNNs) and transformer models differ significantly in their approach to machine translation. RNNs process sequences sequentially, one step at a time, which can be limiting due to their difficulty in handling long-range dependencies effectively. They suffer from issues like vanishing gradients, making it challenging to retain information over extended inputs . Meanwhile, transformer models use an attention mechanism that allows them to process entire input sequences simultaneously and focus on different parts of the input data in parallel. This capability enables them to better capture contextual relationships over long distances without the constraints of sequential processing . Transformer's parallel processing also allows for quicker training times compared to the sequential approach of RNNs .