Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence PDF
Document Details
Uploaded by Deleted User
Fudan University
Chi Sun, Luyao Huang, Xipeng Qiu
Tags
Related
- Performance Management and Appraisal PDF
- CESC Lesson 3- Elements and Typologies of Community PDF
- Sensory Systems: Somatosensation (Touch) Lecture Notes PDF
- Physiologie - Licence 1 - Contraction Musculaire Et Mouvement PDF
- Dental and Developmental Anomalies Lecture Notes PDF
- M2 L3 PDF - Unpacking the Self
Summary
This is a research paper on aspect-based sentiment analysis (ABSA). The authors propose a novel method of converting ABSA into a sentence-pair classification task, using BERT, and achieve state-of-the-art results on SentiHood and SemEval-2014 datasets.
Full Transcript
Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence Chi Sun, Luyao Huang, Xipeng Qiu∗ Shanghai Key Laboratory of Intelligent Information Processing, Fudan University Sch...
Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence Chi Sun, Luyao Huang, Xipeng Qiu∗ Shanghai Key Laboratory of Intelligent Information Processing, Fudan University School of Computer Science, Fudan University 825 Zhangheng Road, Shanghai, China {sunc17,lyhuang18,xpqiu}@fudan.edu.cn Abstract Both SA and ABSA are sentence-level or document-level tasks, but one comment may re- Aspect-based sentiment analysis (ABSA), fer to more than one object, and sentence-level which aims to identify fine-grained opinion tasks cannot handle sentences with multiple tar- polarity towards a specific aspect, is a chal- lenging subtask of sentiment analysis (SA). gets. Therefore, Saeidi et al. (2016) introduce In this paper, we construct an auxiliary sen- the task of targeted aspect-based sentiment analy- tence from the aspect and convert ABSA to a sis (TABSA), which aims to identify fine-grained sentence-pair classification task, such as ques- opinion polarity towards a specific aspect associ- tion answering (QA) and natural language in- ated with a given target. The task can be divided ference (NLI). We fine-tune the pre-trained into two steps: (1) the first step is to determine the model from BERT and achieve new state-of- aspects associated with each target; (2) the second the-art results on SentiHood and SemEval- step is to resolve the polarity of aspects to a given 2014 Task 4 datasets. The source codes are available at https://github.com/ target. HSLCY/ABSA-BERT-pair. The earliest work on (T)ABSA relied heav- ily on feature engineering (Wagner et al., 2014; 1 Introduction Kiritchenko et al., 2014), and subsequent neu- ral network-based methods (Nguyen and Shirai, Sentiment analysis (SA) is an important task in 2015; Wang et al., 2016; Tang et al., 2015, 2016; natural language processing. It solves the com- Wang et al., 2017) achieved higher accuracy. Re- putational processing of opinions, emotions, and cently, Ma et al. (2018) incorporate useful com- subjectivity - sentiment is collected, analyzed and monsense knowledge into a deep neural network summarized. It has received much attention not to further enhance the result of the model. Liu only in academia but also in industry, provid- et al. (2018) optimize the memory network and ing real-time feedback through online reviews on apply it to their model to better capture linguistic websites such as Amazon, which can take advan- structure. tage of customers’ opinions on specific products or More recently, the pre-trained language models, services. The underlying assumption of this task is such as ELMo (Peters et al., 2018), OpenAI GPT that the entire text has an overall polarity. (Radford et al., 2018), and BERT (Devlin et al., However, the users’ comments may contain dif- 2018), have shown their effectiveness to allevi- ferent aspects, such as: “This book is a hardcover ate the effort of feature engineering. Especially, version, but the price is a bit high.” The polarity in BERT has achieved excellent results in QA and ‘appearance’ is positive, and the polarity regarding NLI. However, there is not much improvement ‘price’ is negative. Aspect-based sentiment analy- in (T)ABSA task with the direct use of the pre- sis (ABSA) (Jo and Oh, 2011; Pontiki et al., 2014, trained BERT model (see Table 3). We think this 2015, 2016) aims to identify fine-grained polarity is due to the inappropriate use of the pre-trained towards a specific aspect. This task allows users to BERT model. evaluate aggregated sentiments for each aspect of Since the input representation of BERT can rep- a given product or service and gain a more granu- resent both a single text sentence and a pair of lar understanding of their quality. text sentences, we can convert (T)ABSA into a ∗ Corresponding author. sentence-pair classification task and fine-tune the 380 Proceedings of NAACL-HLT 2019, pages 380–385 Minneapolis, Minnesota, June 2 - June 7, 2019. c 2019 Association for Computational Linguistics pre-trained BERT. Example: In this paper, we investigate several methods LOCATION2 is central London so extremely of constructing an auxiliary sentence and trans- expensive, LOCATION1 is often considered form (T)ABSA into a sentence-pair classification the coolest area of London. task. We fine-tune the pre-trained model from BERT and achieve new state-of-the-art results on Target Aspect Sentiment (T)ABSA task. We also conduct a comparative ex- LOC1 general Positive periment to verify that the classification based on LOC1 price None a sentence-pair is better than the single-sentence LOC1 safety None classification with fine-tuned BERT, which means LOC1 transit-location None that the improvement is not only from BERT but LOC2 general None also from our method. In particular, our contribu- LOC2 price Negative tion is two-fold: LOC2 safety None 1. We propose a new solution of (T)ABSA by LOC2 transit-location Positive converting it to a sentence-pair classification task. Table 1: An example of SentiHood dataset. 2. We fine-tune the pre-trained BERT model and achieve new state-of-the-art results on Senti- Hood and SemEval-2014 Task 4 datasets. Methods Output Auxiliary Sentence QA-M S.P. Question w/o S.P. 2 Methodology NLI-M S.P. Pseudo w/o S.P. QA-B {yes,no} Question w/ S.P. In this section, we describe our method in detail. NLI-B {yes,no} Pseudo w/ S.P. 2.1 Task description Table 2: The construction methods. Due to limited TABSA In TABSA, a sentence s usually con- space, we use the following abbreviations: S.P. for sen- sists of a series of words: {w1 , · · · , wm }, and timent polarity, w/o for without, and w/ for with. some of the words {wi1 , · · · , wik } are pre- identified targets {t1 , · · · , tk }, following Saeidi Sentences for QA-M The sentence we want to et al. (2016), we set the task as a 3- generate from the target-aspect pair is a question, class classification problem: given the sen- and the format needs to be the same. For example, tence s, a set of target entities T and a for the set of a target-aspect pair (LOCATION1, fixed aspect set A = {general, price, transit- safety), the sentence we generate is “what do you location, saf ety}, predict the sentiment polarity think of the safety of location - 1 ?” y ∈ {positive, negative, none} over the full set of the target-aspect pairs {(t, a) : t ∈ T, a ∈ A}. Sentences for NLI-M For the NLI task, the con- As we can see in Table 1, the gold standard polar- ditions we set when generating sentences are less ity of (LOCATION2, price) is negative, while the strict, and the form is much simpler. The sen- polarity of (LOCATION1, price) is none. tence created at this time is not a standard sen- ABSA In ABSA, the target-aspect pairs {(t, a)} tence, but a simple pseudo-sentence, with (LOCA- become only aspects a. This setting is equiva- TION1, safety) pair as an example: the auxiliary lent to learning subtask 3 (Aspect Category De- sentence is: “location - 1 - safety”. tection) and subtask 4 (Aspect Category Polarity) Sentences for QA-B For QA-B, we add the la- of SemEval-2014 Task 41 at the same time. bel information and temporarily convert TABSA 2.2 Construction of the auxiliary sentence into a binary classification problem (label ∈ {yes, no}) to obtain the probability distribution. For simplicity, we mainly describe our method At this time, each target-aspect pair will gener- with TABSA as an example. ate three sequences such as “the polarity of the We consider the following four methods to con- aspect safety of location - 1 is positive”, “the vert the TABSA task into a sentence pair classifi- polarity of the aspect safety of location - 1 is cation task: negative”, “the polarity of the aspect safety of 1 http://alt.qcri.org/semeval2014/task4/ location - 1 is none”. We use the probabil- 381 ity value of yes as the matching score. For a are na. We consider TABSA as a combination target-aspect pair which generates three sequences of nt · na target-aspect-related sentiment classifi- (positive, negative, none), we take the class of cation problems, first classifying each sentiment the sequence with the highest matching score for classification problem, and then summarizing the the predicted category. results obtained. For ABSA, We fine-tune pre- trained BERT model to train na classifiers for all Sentences for NLI-B The difference between aspects and then summarize the results. NLI-B and QA-B is that the auxiliary sentence changes from a question to a pseudo-sentence. BERT-pair for (T)ABSA BERT for sentence The auxiliary sentences are: “location - 1 - safety pair classification tasks. Based on the auxil- - positive”, “location - 1 - safety - negative”, and iary sentence constructed in Section 2.2, we use “location - 1 - safety - none”. the sentence-pair classification approach to solve After we construct the auxiliary sentence, we (T)ABSA. Corresponding to the four ways of con- can transform the TABSA task from a single sen- structing sentences, we name the models: BERT- tence classification task to a sentence pair classi- pair-QA-M, BERT-pair-NLI-M, BERT-pair-QA- fication task. As shown in Table 3, this is a nec- B, and BERT-pair-NLI-B. essary operation that can significantly improve the experimental results of the TABSA task. 3 Experiments 2.3 Fine-tuning pre-trained BERT 3.1 Datasets BERT (Devlin et al., 2018) is a new language rep- We evaluate our method on the SentiHood (Saeidi resentation model, which uses bidirectional trans- et al., 2016) dataset2 , which consists of 5,215 sen- formers to pre-train a large corpus, and fine-tunes tences, 3,862 of which contain a single target, and the pre-trained model on other tasks. We fine- the remainder multiple targets. Each sentence con- tune the pre-trained BERT model on TABSA task. tains a list of target-aspect pairs {t, a} with the Let’s take a brief look at the input representation sentiment polarity y. Ultimately, given a sentence and the fine-tuning procedure. s and the target t in the sentence, we need to: (1) detect the mention of an aspect a for the tar- 2.3.1 Input representation get t; The input representation of the BERT can explic- (2) determine the positive or negative sentiment itly represent a pair of text sentences in a sequence polarity y for detected target-aspect pairs. of tokens. For a given token, its input represen- We also evaluate our method on SemEval-2014 tation is constructed by summing the correspond- Task 4 (Pontiki et al., 2014) dataset3 for aspect- ing token, segment, and position embeddings. For based sentiment analysis. The only difference classification tasks, the first word of each sequence from the SentiHood is that the target-aspect pairs is a unique classification embedding ([CLS]). {t, a} become only aspects a. This setting allows us to jointly evaluate subtask 3 (Aspect Category 2.3.2 Fine-tuning procedure Detection) and subtask 4 (Aspect Category Polar- BERT fine-tuning is straightforward. To obtain a ity). fixed-dimensional pooled representation of the in- put sequence, we use the final hidden state (i.e., 3.2 Hyperparameters the output of the transformer) of the first token We use the pre-trained uncased BERT-base as the input. We denote the vector as C ∈ RH. model4 for fine-tuning. The number of Trans- Then we add a classification layer whose param- former blocks is 12, the hidden layer size is eter matrix is W ∈ RK×H , where K is the num- 768, the number of self-attention heads is 12, ber of categories. Finally, the probability of each and the total number of parameters for the pre- category P is calculated by the softmax function trained model is 110M. When fine-tuning, we keep P = softmax(CW T ). the dropout probability at 0.1, set the number of 2.3.3 BERT-single and BERT-pair 2 Dataset mirror: https://github.com/uclmr/jack/tree/master /data/sentihood BERT-single for (T)ABSA BERT for single 3 http://alt.qcri.org/semeval2014/task4/ sentence classification tasks. Suppose the number 4 https://storage.googleapis.com/bert models/2018 10 18/ of target categories are nt and aspect categories uncased L-12 H-768 A-12.zip 382 Aspect Sentiment Model Acc. F1 AUC Acc. AUC LR (Saeidi et al., 2016) - 39.3 92.4 87.5 90.5 LSTM-Final (Saeidi et al., 2016) - 68.9 89.8 82.0 85.4 LSTM-Loc (Saeidi et al., 2016) - 69.3 89.7 81.9 83.9 LSTM+TA+SA (Ma et al., 2018) 66.4 76.7 - 86.8 - SenticLSTM (Ma et al., 2018) 67.4 78.2 - 89.3 - Dmu-Entnet (Liu et al., 2018) 73.5 78.5 94.4 91.0 94.8 BERT-single 73.7 81.0 96.4 85.5 84.2 BERT-pair-QA-M 79.4 86.4 97.0 93.6 96.4 BERT-pair-NLI-M 78.3 87.0 97.5 92.1 96.5 BERT-pair-QA-B 79.2 87.9 97.1 93.3 97.0 BERT-pair-NLI-B 79.8 87.5 96.6 92.8 96.9 Table 3: Performance on SentiHood dataset. We boldface the score with the best performance across all models. We use the results reported in Saeidi et al. (2016), Ma et al. (2018) and Liu et al. (2018). “-” means not reported. epochs to 4. The initial learning rate is 2e-5, and In sentiment classification, we use accuracy and the batch size is 24. macro-average AUC as the evaluation indices. 3.3 Exp-I: TABSA 3.3.1 Results We compare our model with the following models: Results on SentiHood are presented in Table 3. The results of the BERT-single model on aspect LR (Saeidi et al., 2016): a logistic regression detection are better than Dmu-Entnet, but the ac- classifier with n-gram and pos-tag features. curacy of sentiment classification is much lower than that of both SenticLstm and Dmu-Entnet, LSTM-Final (Saeidi et al., 2016): a biLSTM with a difference of 3.8 and 5.5 respectively. model with the final state as a representation. However, BERT-pair outperforms other models LSTM-Loc (Saeidi et al., 2016): a biLSTM on aspect detection and sentiment analysis by a model with the state associated with the tar- substantial margin, obtaining 9.4 macro-average get position as a representation. F1 and 2.6 accuracies improvement over Dmu- Entnet. Overall, the performance of the four LSTM+TA+SA (Ma et al., 2018): a biLSTM BERT-pair models is close. It is worth noting that model which introduces complex target-level BERT-pair-NLI models perform relatively better and sentence-level attention mechanisms. on aspect detection, while BERT-pair-QA models perform better on sentiment classification. Also, SenticLSTM (Ma et al., 2018): an upgraded the BERT-pair-QA-B and BERT-pair-NLI-B mod- version of the LSTM+TA+SA model which els can achieve better AUC values on sentiment introduces external information from Sentic- classification than the other models. Net (Cambria et al., 2016). 3.4 Exp-II: ABSA Dmu-Entnet (Liu et al., 2018): a bi- directional EntNet (Henaff et al., 2016) with The benchmarks for SemEval-2014 Task 4 are external “memory chains” with a delayed the two best performing systems in Pontiki et al. memory update mechanism to track entities. (2014) and ATAE-LSTM (Wang et al., 2016). When evaluating SemEval-2014 Task 4 subtask 3 During the evaluation of SentiHood, following and subtask 4, following Pontiki et al. (2014), we Saeidi et al. (2016), we only consider the four most use Micro-F1 and accuracy respectively. frequently seen aspects (general, price, transit- location, safety). When evaluating the aspect de- 3.4.1 Results tection, following Ma et al. (2018), we use strict Results on SemEval-2014 are presented in Ta- accuracy and Macro-F1, and we also report AUC. ble 4 and Table 5. We find that BERT-single 383 Models P R F1 (Devlin et al., 2018) that the BERT model has an advantage in dealing with sentence pair classifica- XRCE 83.23 81.37 82.29 tion tasks. This advantage comes from both un- NRC-Canada 91.04 86.24 88.58 supervised masked language model and next sen- BERT-single 92.78 89.07 90.89 tence prediction tasks. BERT-pair-QA-M 92.87 90.24 91.54 TABSA is more complicated than SA due to ad- BERT-pair-NLI-M 93.15 90.24 91.67 ditional target and aspect information. Directly BERT-pair-QA-B 93.04 89.95 91.47 fine-tuning the pre-trained BERT on TABSA does BERT-pair-NLI-B 93.57 90.83 92.18 not achieve performance growth. However, when we separate the target and the aspect to form an Table 4: Test set results for Semeval-2014 task 4 Sub- auxiliary sentence and transform the TABSA into task 3: Aspect Category Detection. We use the results a sentence pair classification task, the scenario is reported in XRCE (Brun et al., 2014) and NRC-Canada (Kiritchenko et al., 2014). similar to QA and NLI, and then the advantage of the pre-trained BERT model can be fully utilized. Our approach is not limited to TABSA, and this Models 4-way 3-way Binary construction method can be used for other similar XRCE 78.1 - - tasks. For ABSA, we can use the same approach to NRC-Canada 82.9 - - construct the auxiliary sentence with only aspects. LSTM - 82.0 88.3 In BERT-pair models, BERT-pair-QA-B and ATAE-LSTM - 84.0 89.9 BERT-pair-NLI-B achieve better AUC values on BERT-single 83.7 86.9 93.3 sentiment classification, probably because of the BERT-pair-QA-M 85.2 89.3 95.4 modeling of label information. BERT-pair-NLI-M 85.1 88.7 94.4 BERT-pair-QA-B 85.9 89.9 95.6 5 Conclusion BERT-pair-NLI-B 84.6 88.7 95.1 In this paper, we constructed an auxiliary sen- Table 5: Test set accuracy (%) for Semeval-2014 task tence to transform (T)ABSA from a single sen- 4 Subtask 4: Aspect Category Polarity. We use the tence classification task to a sentence pair clas- results reported in XRCE (Brun et al., 2014), NRC- Canada (Kiritchenko et al., 2014) and ATAE-LSTM sification task. We fine-tuned the pre-trained (Wang et al., 2016). “-” means not reported. BERT model on the sentence pair classification task and obtained the new state-of-the-art results. We compared the experimental results of single has achieved better results on these two subtasks, sentence classification and sentence pair classifi- and BERT-pair has achieved further improvements cation based on BERT fine-tuning, analyzed the over BERT-single. The BERT-pair-NLI-B model advantages of sentence pair classification, and ver- achieves the best performance for aspect category ified the validity of our conversion method. In the detection. For aspect category polarity, BERT- future, we will apply this conversion method to pair-QA-B performs best on all 4-way, 3-way, and other similar tasks. binary settings. 4 Discussion Acknowledgments Why is the experimental result of the BERT-pair We would like to thank the anonymous re- model so much better? On the one hand, we viewers for their valuable comments. The re- convert the target and aspect information into an search work is supported by Shanghai Munic- auxiliary sentence, which is equivalent to expo- ipal Science and Technology Commission (No. nentially expanding the corpus. A sentence si 16JC1420401 and 17JC1404100), National Key in the original data set will be expanded into Research and Development Program of China (si , t1 , a1 ), · · · , (si , t1 , ana ), · · · , (si , tnt , ana ) in (No. 2017YFB1002104), and National Natural the sentence pair classification task. On the other Science Foundation of China (No. 61672162 and hand, it can be seen from the amazing improve- 61751201). ment of the BERT model on the QA and NLI tasks 384 References Maria Pontiki, Dimitris Galanis, Haris Papageorgiou, Suresh Manandhar, and Ion Androutsopoulos. 2015. Caroline Brun, Diana Nicoleta Popa, and Claude Roux. Semeval-2015 task 12: Aspect based sentiment anal- 2014. Xrce: Hybrid classification for aspect-based ysis. In Proceedings of the 9th International Work- sentiment analysis. In Proceedings of the 8th In- shop on Semantic Evaluation (SemEval 2015), pages ternational Workshop on Semantic Evaluation (Se- 486–495. mEval 2014), pages 838–842. Maria Pontiki, Dimitris Galanis, John Pavlopoulos, Erik Cambria, Soujanya Poria, Rajiv Bajpai, and Björn Harris Papageorgiou, Ion Androutsopoulos, and Schuller. 2016. Senticnet 4: A semantic resource for Suresh Manandhar. 2014. Semeval-2014 task 4: As- sentiment analysis based on conceptual primitives. pect based sentiment analysis. In Proceedings of the In Proceedings of COLING 2016, the 26th Inter- 8th International Workshop on Semantic Evaluation national Conference on Computational Linguistics: (SemEval 2014), pages 27–35. Association for Com- Technical Papers, pages 2666–2677. putational Linguistics. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep Alec Radford, Karthik Narasimhan, Tim Sali- bidirectional transformers for language understand- mans, and Ilya Sutskever. 2018. Improv- ing. arXiv preprint arXiv:1810.04805. ing language understanding by generative pre- training. URL https://s3-us-west-2. amazon- Mikael Henaff, Jason Weston, Arthur Szlam, Antoine aws. com/openai-assets/research-covers/language- Bordes, and Yann LeCun. 2016. Tracking the world unsupervised/language understanding paper. pdf. state with recurrent entity networks. arXiv preprint arXiv:1612.03969. Marzieh Saeidi, Guillaume Bouchard, Maria Liakata, and Sebastian Riedel. 2016. Sentihood: targeted Yohan Jo and Alice H Oh. 2011. Aspect and senti- aspect based sentiment analysis dataset for urban ment unification model for online review analysis. neighbourhoods. arXiv preprint arXiv:1610.03771. In Proceedings of the fourth ACM international con- ference on Web search and data mining, pages 815– Duyu Tang, Bing Qin, Xiaocheng Feng, and 824. ACM. Ting Liu. 2015. Effective lstms for target- dependent sentiment classification. arXiv preprint Svetlana Kiritchenko, Xiaodan Zhu, Colin Cherry, and arXiv:1512.01100. Saif Mohammad. 2014. Nrc-canada-2014: Detect- ing aspects and sentiment in customer reviews. In Duyu Tang, Bing Qin, and Ting Liu. 2016. Aspect Proceedings of the 8th International Workshop on level sentiment classification with deep memory net- Semantic Evaluation (SemEval 2014), pages 437– work. arXiv preprint arXiv:1605.08900. 442. Joachim Wagner, Piyush Arora, Santiago Cortes, Utsab Fei Liu, Trevor Cohn, and Timothy Baldwin. 2018. Re- Barman, Dasha Bogdanova, Jennifer Foster, and current entity networks with delayed memory update Lamia Tounsi. 2014. Dcu: Aspect-based polarity for targeted aspect-based sentiment analysis. arXiv classification for semeval task 4. In Proceedings of preprint arXiv:1804.11019. the 8th international workshop on semantic evalua- tion (SemEval 2014), pages 223–229. Yukun Ma, Haiyun Peng, and Erik Cambria. 2018. Targeted aspect-based sentiment analysis via em- Bo Wang, Maria Liakata, Arkaitz Zubiaga, and Rob bedding commonsense knowledge into an attentive Procter. 2017. Tdparse: Multi-target-specific sen- lstm. In Proceedings of AAAI. timent recognition on twitter. In Proceedings of the 15th Conference of the European Chapter of the As- Thien Hai Nguyen and Kiyoaki Shirai. 2015. sociation for Computational Linguistics: Volume 1, Phrasernn: Phrase recursive neural network for Long Papers, volume 1, pages 483–493. aspect-based sentiment analysis. In Proceedings of the 2015 Conference on Empirical Methods in Nat- Yequan Wang, Minlie Huang, Li Zhao, et al. 2016. ural Language Processing, pages 2509–2514. Attention-based lstm for aspect-level sentiment clas- sification. In Proceedings of the 2016 conference on Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt empirical methods in natural language processing, Gardner, Christopher Clark, Kenton Lee, and Luke pages 606–615. Zettlemoyer. 2018. Deep contextualized word rep- resentations. arXiv preprint arXiv:1802.05365. Maria Pontiki, Dimitris Galanis, Haris Papageor- giou, Ion Androutsopoulos, Suresh Manandhar, AL- Smadi Mohammad, Mahmoud Al-Ayyoub, Yanyan Zhao, Bing Qin, Orphée De Clercq, et al. 2016. Semeval-2016 task 5: Aspect based sentiment anal- ysis. In Proceedings of the 10th international work- shop on semantic evaluation (SemEval-2016), pages 19–30. 385