TOWE - ASPECT AND OPINION TERMS EXTRACTION

3.1 ASPECT AND OPINION TERMS EXTRACTION

3.1.2 TOWE

architectures but still worst than the proposed IOG model, approximately 10% lower in F1-score.

Following the work of Fan, other recent works have tried to improve the classification results for the task by changing the approach using new architectures, new training objec-tives, transferring knowledge from external corpora, and even by proposing new labeling models. Below we describe these approaches.

Wu et al. (2020b) aims at transferring knowledge from external review corpus to improve the Target-Oriented Opinion Word Extraction (TOWE) task. For that, the au-thors propose the Latent Opinions Transfer Networks (LOTN) model, which consists of two components: The first is a simple position and word embedding-based BiLSTM network, called PE-BiLSTM, that performs the actual TOWE task; and the second com-ponent is a pre-trained sentiment classification model responsible for retrieving global and target-dependent word-level representations from the input review sentence. The proposed model works as follows: the review sentence is sent to the pre-trained sentence classification model, which outputs hidden states and attention weights relative to the sentiment classification task. This information is then concatenated to the hidden states of the PE-BiLSTM network. Hence, the representation of the TOWE module will contain task-specific context representation and external opinion knowledge.

To transform the global and target-independent opinion information coming from the sentiment classification module into target-dependent information, the authors follow the assumption that “the word that is closer to the opinion target is more likely to be the opinion word of the target”. This premise is computed by a target-relevant distance weight function that considers the sentence size and the relative position of the words in the sentence to the target word. The authors perform experiments over the built datasets from TOWE, using the Amazon Review and Yelp Review corpus to train the sentiment classification model applied for latent opinion transference. Performed experiments show that the proposed method outperforms the state-of-the-art model IOG by 1.98% and 2.02% F1-score for the restaurant datasets from the SemEval 2014 and 2015.

Zhou et al. (2020) argue that most previous methods for the task have relied on the sequential representation of the sequence, ignoring the dependency structure between the target and opinion words. Hence, the authors propose a neural network architecture based on Graph Convolutional Networks (GCN) which captures the syntactic structure of the sentence and the syntactic relations between the terms. According to the authors, this

approach circumvents the problem of capturing dependencies between the sequence of words when the opinion is far from the opinion target. The authors also increment the training of the proposed model with adversarial training, by adding small perturbations to the input word embeddings, which can enhance the generalization and robustness of the model.

The model is composed of a BiLSTM encoder, which learns the contextual information of the words in the sentence. The GCN is applied over the dependency tree to compute the syntactic representation of the sentence. Both representations are integrated to predict the label of each word. The adversarial examples are created during training. These samples are added as noises to the model, hence they are built by adding worst-case perturba-tions into the original word embeddings, i.e. the perturbation that maximizes the loss function. This training step considers two loss functions which refer to the cross-entropy loss on the original samples and on the adversarial ones. Therefore, the training objective considers both loss functions during training. The results of the performed experiments show superior performance when compared to the traditional IOG model for the TOWE task and other basic architectures using only BiLSTM and not considering syntactic fea-tures for sequence classification. The authors also benchmark the proposed model with distance-rule and dependency-rule models which are already proven to perform worse for the task.

Zhang et al. (2021a) addresses the TOWE problem similarly to Zhou et al. (2020).

The authors propose the use of GCN for capturing syntactic features between aspects and opinion words. However, to leverage the challenge of using GCNs the authors integrate a memory mechanism that updates the hidden states of each node with historical, local fea-ture, and contextual information. The model consists of word and positional embeddings given as inputs to a BiLSTM encoder. This BiLSTM encoder process the sequence and integrates its representations with the GCN and memory cells’ hidden states. The graph for the syntactic relations is split into multiple subgraphs, where each node is assigned with a memory cell. In a recurrent manner, each node is updated to build the final node representation. The cross-entropy function is used as a training criterion and a custom loss function is applied. The authors benchmark the proposed model with traditional distance-rule, dependency-rule, sequential, and pipelined models. State-of-the-art models as IOG and LOTN are also compared to the proposed work. The experiment results show that the proposed model outperforms the other analyzed works.

Zhang et al. (2021b) approach the TOWE task as a question-answering problem.

Hence, they build a multiview-trained machine reading comprehension model, that con-sists in training a Machine Reading Comprehension (MRC) model and split the prob-lem into three separate views: identifying opinions oriented to a given target (TOWE), Opinion-Related Aspect Targets Extraction (OATE), and performing Target-Opinion Pair Relation Classification (PRC). The authors use three question templates to automatically build questions that will help in the model training. Then, introduce the MultiView Train-ing (MVT) strategy that captures the common knowledge obtained from those different views. To learn the contextualized representations for each token the authors use Bidi-rectional Encoder Representation from Transformer (BERT), which is used as the MRC model. The TOWE and OATE views receive as input the last hidden states from the BERT transformer and pass it through a softmax function, using cross-entropy as the training criterion. The PRC view receives only the last hidden states that correspond to the [CLS] token, which is also sent through a softmax function and the cross-entropy loss function. For MVT, the authors introduce a meta-learning approach, whose goal is to learn parameter initializations that could fastly adapt to all three tasks with only a few training data. The framework ends by initializing all training views with the learned parameters from the meta-learning approach, then finetunes it to the final TOWE task.

No documento Augmenting product knowledge graphs with subjective information (páginas 49-52)