5. Results and Discussion
5.1. Unbabel Error Typology
5.1.3. Annotators’ feedback
Choice of severities
Annotator A Annotator B
EN (source)
(34a) If you are still unable to make payment, please contact your card issuer.
JA (target)
(34b) それでもお支払いができない場合は、
カード発行者にお問い合わせください。
[Lexical Selection, major]
EN (source)
(35a) If you are still unable to make payment, please contact your card issuer.
JA (target)
(35b) それでもお支払いができない場合は、
カード発行者にお問い合わせください。
[Lexical Selection, minor]
EN (source)
(36a) For you to remove the current one, you have to contact the firmware provider.
ZH-CN (target)
(36b) 为了删除当前的 one ,您必须与固件
提供商联系。
[Untranslated, critical]
EN (source)
(37a) For you to remove the current one, you have to contact the firmware provider.
ZH-CN (target)
(37b) 为了删除当前的 one ,您必须与固件
提供商联系。
[Untranslated, major]
Table 29.Examples of mismatch of severities with the Unbabel Error Typology in Japanese and Simplified Chinese
Although in some cases, such as the double annotation instance on oneOmission error mentioned on example 24, the inconsistencies in the annotations are related to poor interpretation of the annotation guidelines, there are many instances where the result of these annotations prove the need for a new typology and guidelines as the current ones are unclear at times.
pairs and which additional issue types they believed were missing in the typology. Among other languages these surveys gathered the input of the annotators for the translation directions of English to Japanese and English to Simplified Chinese. Table 30 represents which issue types were pointed out as not applicable for both language pairs, while Table 31 demonstrates the issue types that were proposed for each LP by the annotators.
Annotators’ Feedback
Not applicable issue types Japanese Simplified Chinese
Ambiguous Translation ✓
Overly Literal ✓
False Friend ✓ ✓
Source/Target Disagreement ✓ ✓
Wrong Paronym ✓
POS ✓
Capitalization ✓
Diacritics ✓ ✓
Hyphenation ✓ ✓
Wrong Language Variety ✓
Hypernym/Hyponym ✓
Synonym ✓
Mistranslated Term ✓
Term Wrongly Applied ✓
Tense/Mood/Aspect ✓
Omitted Auxiliary Verb ✓
Omitted Determiner ✓
Agreement ✓
Table 30.Annotators’ feedback on not applicable issue types in the Unbabel Error Typology
Annotators’ feedback
Proposed issue types Japanese Simplified Chinese
Inappropriate ✓
Omitted Aspect Marker ✓
Omitted Argument ✓
Omitted Adjunct ✓
Omitted Particle ✓
Omitted Classifier ✓
Wrong Classifier ✓
Table 31.Annotators’ feedback on missing issue types in the Unbabel Error Typology
When analyzing this table it becomes clear that the annotators of both languages consider many of the specific issue types underMistranslation as unnecessary and confusing, as some of these were also pointed out to be unclearly defined in the annotation guidelines for this typology.
In addition, the issue types contained under Spelling were also considered as unuseful by the annotators of both language pairs, either because the usage guidelines are unclear or because the issue types do not apply, in the case of Chinese. Similarly, issue types such as Hyphenation, Diacritics andCapitalization were also identified as unnecessary due to the characteristics of the
languages at hand. Furthermore, the Simplified Chinese annotators also identified Tense/Mood/Aspect as an unnecessary issue type, as verbs in Chinese are not inflected and Tense/Mood/Aspect is expressed through the usage of particles, adverbs and auxiliary verbs.
Finally, it is also visible that Terminology issue types have been marked as unnecessary. While this is not due to the linguistic characteristics of any of these translation directions, as terminology is a part of both, it is likely that the existence of three different issue types under Terminology is confusing for annotators who end up considering at least one of them as superfluous.
In relation to the issue types the annotators suggested should be added, the Chinese annotators requested further distinction of categories within Omission and one additional issue type under Function Words for wrong classifiers while the Japanese annotators only suggested one new issue type named “Inappropriate” which, in the proposed definition, was very similar to the already existingOverly Literalissue type.
However, it should be noted that during the round of annotations for this thesis that were performed using the Unbabel Error Typology, one of the Japanese annotators was the only to leave feedback regarding this typology, in which they asked precisely to report missing issue types, usingMissing Classifieras an example.
When analyzing the results of the annotations performed with this typology, it was necessary to keep in mind two important factors that could have opposing effects on the results.
On one hand, this typology was the most extensive of all three under analysis, with a total of 47 selectable issue types, and it is also the typology that was considered to be in need of an adaptation that was more suitable for the annotation of East Asian languages.
On the other hand, however, it was also the typology the annotators were the most familiar with.
This means that while there was already concrete feedback on this typology and the changes that it needed in the eyes of the annotators, it was expected that due to familiarity with the typology the IAA results in particular would not be extremely low, which was verified mostly with Japanese but did not verify with the Simplified Chinese annotations in particular, which
might be due to the fact that, as seen in Section 5.1.3., it is the language pair for which the annotators made more remarks in relation to missing categories.