2.4 Baidu Ernie-M
2.4.3 Experimental results
There are two types of corpora used to train the ERNIE-M model: monolingual and parallel corpora, which is based on the PaddlePaddle8 framework. This parallel corpus consists of a total of approximately 1.5 trillion characters of common words in 96 languages, including Chinese, English, French, Afrikaans, Albanian, Amharic, Sanskrit, Arabic, Armenian, Assamese, Azerbaijani, etc. A wide range of publicly available datasets were used to test the efficacy of ERNIE-M on five distinct tasks, cross-language natural language inference, reading comprehension, named entity recognition, semantic similarity, and cross-language retrieval, and all obtained optimal results.
ERNIE-M was evaluated by the Baidu researchers in two ways in order to determine its effectiveness.
1. Cross-lingual Transfer: This is an approach in which a model trained in English is directly tested on other languages to test whether the model is able to understand the other language. For example, if the model is asked to understand that
"This restaurant has a comfortable environment" is a positive sentiment, the model needs to determine that "I am very happy." is also a positive sentiment. In practical applications, if there is a lack of labeled data in a specific language, this technique can help solve the problem by training multilingual models with labeled data in other languages, reducing the difficulty of building small language systems, even if there is no labeled data available in the language of interest.
8https://www.paddlepaddle.org.cn/en
2. Multi-language Fine-tuning: This approach uses the labeled data of all languages to train the model in multiple tasks, and then verifies whether the model can utilize the labeled data for other languages as well to further enhance the understanding of the language when the labeled data of the language is available.
The experiments of ERNIE-M also verified the effect of the model in other application fields, including cross-language retrieval, natural language inference, reading comprehension, and named entity recognition, which are summarized as following.
In the Cross-Lingual Information Retrieval task, semantically identical sentences are retrieved from bilingual corpora in order to find out what their content is. As can be seen in figure 14. With ERNIE-M, users will be able to retrieve results in other languages, such as Portuguese, French, German, etc., by searching for them in one language, such as English. The cross-language retrieval task of ERNIE-M achieves an accuracy rate of 87.9%, according to its performance on Tatoeba9.
Figure 14: ERNIE-M on Cross-Lingual Information Retrieval task
In natural language understanding, natural language inference serves as a benchmark task. This is seen as one of the most challenging tasks since it aims to determine what logical relationship may exist between two sentences. Table 2 shows two examples of it. Multilingual dataset Cross-lingual Natural Language Inference
9https://paperswithcode.com/dataset/tatoeba
(XNLI)10 contains 15 languages, which includes major languages like English and French, as well as minor languages like Swahili, which are part of the XNLI dataset.
Language Sentence 1 Sentence 2 Label
English You don’t have to
stay there. You can leave. Related Portuguese Maria tem medo de
água.
Maria gosta muito
de nadar. Contradictory
Table 2: Example of Natural Language Inference
ERNIE-M verified its effectiveness in both Cross-lingual Transfer and Multi-language Fine-tuning. The researchers fine-tuned ERNIE-M training in English and tested it on Chinese, German, and Urdu, and were able to achieve an average accuracy of 82.0%. The accuracy can be further improved to 84.2% if the training corpus of all languages is used (Ouyang et al., 2021).
Table 3: Evaluation results on XNLI cross-lingual natural language inference11 The goal of the Cross-lingual Question Answering task is to answer specific questions based on the text. To evaluate the effectiveness of ERNIE-M on the reading comprehension task, ERNIE-M was evaluated on the MultiLingual Question Answering (MLQA) dataset (Lewis et al., 2020) proposed by Facebook. In this task, the model needs to be trained on English first and then tested on datasets in other
11https://arxiv.org/abs/2012.15674
10https://cims.nyu.edu/~sbowman/xnli/
languages. This task can evaluate the effectiveness of the model on cross-language quizzing tasks and help in the construction of cross-language quizzing systems. The effect of this task is shown in table 5. When ERNIE-M is trained in only one language, 50.2% of questions in different languages can be completely answered correctly (Ouyang, 2021).
Table 5: Accuracy of MLQA data under each model12
The goal of the named entity recognition task is to identify information such as names of people, places, time and institutions in texts. It can help people to extract valuable information from a large number of texts quickly.
12https://developer.baidu.com/article/detail.html?id=292593
Table 6: F1-Score of CoNLL data under each model13
As shown in Table 6, using a multilingual model can help with the task of information extraction on less resourced languages. ERNIE-M was evaluated on the CoNLL (Sang & de Meulder, 2003) dataset and the effect was verified in both Cross-lingual Transfer and Multi-language Fine-tuning modes. The researchers fine-tuned ERNIE-M in English and tested it on Dutch, Spanish and German, achieving an average F1 of 81.6%, which can be further increased to 90.8% when using the training corpus of all languages.