Sentiment Analysis PDF
Document Details
Uploaded by ThrivingWombat
Universidad de Vigo
Daniel Fernández González
Tags
Summary
This presentation discusses sentiment analysis, its various levels (document, sentence, phrase, and aspect), and different techniques used. It covers lexicon-based and machine learning approaches. Applications in business intelligence and examples of product reviews are also included.
Full Transcript
TEXT MINING: SENTIMENT ANALYSIS Daniel Fernández González ([email protected]) INTRODUCTION Aka Opinion Analysis or Opinion Mining. INTRODUCTION Aka Opinion Analysis or Opinion Mining. WHAT IS AN OPINION? WHAT IS AN OPINION? WHAT IS AN OPINION? WHAT IS AN OPINION? WHAT IS AN OPINION? WHAT IS...
TEXT MINING: SENTIMENT ANALYSIS Daniel Fernández González ([email protected]) INTRODUCTION Aka Opinion Analysis or Opinion Mining. INTRODUCTION Aka Opinion Analysis or Opinion Mining. WHAT IS AN OPINION? WHAT IS AN OPINION? WHAT IS AN OPINION? WHAT IS AN OPINION? WHAT IS AN OPINION? WHAT IS AN OPINION? Basic Opinion Representation: Opinion holder: Whose opinion is this? Opinion target: What is this opinion about? Opinion content (or sentiment expression): What exactly is the opinion?Opinion context: Under what situation (e.g., time, location, …) was the opinion expressed? Opinion sentimet (or polarity): What does the opinion tell us about the opinion holder’s feeling (e.g., positive, negative or neutral). PRODUCT REVIEW “Peter (iPhone 15): It is too expensive” Basic Opinion Representation: Opinion holder: Whose opinion is this? Peter Opinion target: What is this opinion about? Opinion content (or sentiment expression): What exactly is the opinion? Opinion context: Under what situation (e.g., time, location, …) was the opinion expressed? Opinion sentimet (or polarity): What does the opinion tell us about the opinion holder’s feeling (e.g., positive, negative or neutral). PRODUCT REVIEW “Peter (iPhone 15): It is too expensive” Basic Opinion Representation: Opinion holder: Whose opinion is this? Peter Opinion target: What is this opinion about? iPhone15 Opinion content (or sentiment expression): What exactly is the opinion? Opinion context: Under what situation (e.g., time, location, …) was the opinion expressed? Opinion sentimet (or polarity): What does the opinion tell us about the opinion holder’s feeling (e.g., positive, negative or neutral). PRODUCT REVIEW “Peter (iPhone 15): It is too expensive” Basic Opinion Representation: Opinion holder: Whose opinion is this? Peter Opinion target: What is this opinion about? iPhone15 Opinion content (or sentiment expression): What exactly is the opinion? “is too expensive” Opinion context: Under what situation (e.g., time, location, …) was the opinion expressed? Opinion sentimet (or polarity): What does the opinion tell us about the opinion holder’s feeling (e.g., positive, negative or neutral). PRODUCT REVIEW “Peter (iPhone 15): It is too expensive” Basic Opinion Representation: Opinion holder: Whose opinion is this? Peter Opinion target: What is this opinion about? iPhone15 Opinion content (or sentiment expression): What exactly is the opinion? “is too expensive” Opinion context: Under what situation (e.g., time, location, …) was the opinion expressed? 2023 Opinion sentimet (or polarity): What does the opinion tell us about the opinion holder’s feeling (e.g., positive, negative or neutral). PRODUCT REVIEW “Peter (iPhone 15): It is too expensive” Basic Opinion Representation: Opinion holder: Whose opinion is this? Peter Opinion target: What is this opinion about? iPhone15 Opinion content (or sentiment expression): What exactly is the opinion? “is too expensive” Opinion context: Under what situation (e.g., time, location, …) was the opinion expressed? 2023 Opinion sentimet (or polarity): What does the opinion tell us about the opinion holder’s feeling (e.g., positive, negative or neutral). negative PRODUCT REVIEW PRODUCT REVIEW PRODUCT REVIEW PRODUCT REVIEW PRODUCT REVIEW OPINION TYPES IN TEXT DATA OPINION TYPES IN TEXT DATA OPINION TYPES IN TEXT DATA OPINION TYPES IN TEXT DATA Indirect/inferred Opinion: “This phone ran out of battery in just 1 hour” OPINION MINING TASK SENTIMENT ANALYSIS LEVELS SENTIMENT ANALYSIS LEVELS Document-level: It is performed on a whole document. Output: a single polarity. SENTIMENT ANALYSIS LEVELS Sentence-level: It is performed on a whole sentence. Output: a single polarity per sentence. SENTIMENT ANALYSIS LEVELS Phrase-level: It is performed on phrases. Output: a several polarities per sentence. SENTIMENT ANALYSIS LEVELS Aspect-level: The most fine-grained SA. A phrase can contain several aspects. It assigns polarity to all the aspects in the sentence. “The camera of iPhone 15 is awesome”, the review is on “camera”, which is an aspect of entity “iPhone 15”. Output: a several polarities per sentence. SENTIMENT ANALYSIS TASKS SENTIMENT ANALYSIS TASKS Most common tasks: Sentence-level Sentiment Classification Aspect-based Sentiment Analysis (ABSA) Structured Sentiment Analysis (SSA) APLICATIONS Bussiness Intelligence: e.g., “The predictive power of public Twitter sentiment for forecasting cryptocurrency prices” (Kraaijeveld and De Smedt, 2020) Recommendation systems: e.g., “An intelligent movie recommendation system through group-level sentiment analysis in microblogs” (Li et al., 2016) Government intelligence: e.g., “Sentiment analysis of brexit negotiating outcomes” (Georgiadou et al., 2020), “Measuring Proximity Between Newspapers and Political Parties” (Falck et al., 2019) Healthcare and medical domain: e.g., “A Sentiment Analysis of Breast Cancer Treatment Experiences and Healthcare Perceptions Across Twitter” (Clark et al., 2018) DEMOS Text2data Dandelion Huggingface Monkelearn GENERAL PROCEDURE OF SA Data collection and extraction Sources: social media, e-commerce websites, review websites, weblogs, forums, interview transcripts, … Resources: APIs (e.g., Twitter API or Facebook API). Free available datasets (e.g., Sentiment140, IMDb dataset, Stanford Sentiment Treebank, EmoEvent,Yelp Review dataset, SemEval2013-2017, …) Web scraping (e.g., ParseHub) Crowdsourcing (e.g., Amazon Mechanical Turk) GENERAL PROCEDURE OF SA Data preprocessing Tokenization Stop words removal (e.g., ‘‘the’’, ‘‘for’’, ‘’under’’) Expanding abbreviations and/or removing repeated characters (e.g., “liiiiike”, “greeaaatttt”) Part-of-Speech (PoS) tagging Lemmatization GENERAL PROCEDURE OF SA Feature extraction Terms presence (e.g., unigrams, bigrams, trigrams) and frequency (TF-IDF weighting scheme). Part-of-Speech (PoS) tags (e.g., adjectives). Opinion words and phrases (e.g. good, wonderful, terrible, LOL …). Negations (aka opinion shifters or valence shifters) (e.g., not, never, none, nobody, nowhere, neither, and cannot). GENERAL PROCEDURE OF SA Sentiment analysis techniques SENTIMENT ANALYSIS TECHNIQUES Lexicon-based approach It is also called knowledge-based. It requires an sentiment lexicon to score words (e.g., +1, -1 or 0, or other values in [-1, 1]). The document polarity is obtained by combining word’s scores. The main problem of this approach is domain dependency, because words can have multiple meanings and senses. E.g., “small” in “The TV screen is too small” and “This camera is very small”. Some lexicon adaptation approaches were proposed. E.g., Sanagar and Gupta (2020). SENTIMENT ANALYSIS TECHNIQUES Lexicon-based approach: Techniques for annotating sentiment lexicons: Manual approach. Examples: MPQA Subjectivity Lexicon (Wilson et al., 2005), Semantic Orientation CALculator (Taboda et al., 2011) Crowdsourcing (Mohammad and Turney, 2013) or gamification (Tower of Babel by Hong et al. (2013)). Dictionary-based approach It uses initial seed words and expand them based on dictionaries (such as WordNet) by searching for synonyms and antonyms. E.g., SentiWordNet (Baccianella and Esuli, 2010) and SentiStrength (Thelwall et al., 2010). Incapable of finding opinion terms with a specified content oriented domain that are not included in the used dictionary. Corpus-based approach It uses initial seed words and expand them based on syntactic and co-ocurrences patterns (e.g., ‘‘simple AND easy’’) in a large corpus. Capacity to identify opinion terms with a particular content orientation. Therefore, when domains are distinct, provides superior results. SENTIMENT ANALYSIS TECHNIQUES Machine learning approach Supervised learning Un-supervised learning Semi-supervised learning Reinforcement learning SENTIMENT ANALYSIS TECHNIQUES Supervised learning Traditional techniques: Linear classifiers: SVMs, ANN Probabilistic classifiers: Bayesian Network, Maximum Entropy Decision trees Rule-based Deep learning DEEP LEARNING IN SA Sentence-level Sentiment Clasification Aspect-based Sentiment Analysis (ABSA) Structured Sentiment Analysis (SSA) DEEP LEARNING IN SA Sentence-level Sentiment Clasification CNN-based models: Kim (2014) and Kalchbrenner et al. (2014). Hybrid approach: Shin et al. (2016) integrates lexicon embeddings to CNNs. DEEP LEARNING IN SA Sentence-level Sentiment Clasification Recursive-NN-based models: Socher et al. (2013) Five labels: - -, - , 0, +, ++. DEEP LEARNING IN SA Sentence-level Sentiment Clasification RNN-based models: Wang et al. (2015) propose a LSTM-based approach. Tai et al. (2015) present BiLSTM-based and Tree-LSTMs approaches. Wang et al. (2016) propose a joint CNN-RNN approach. Li et al. (2020) enhance a CNN-LSTM-based model with a sentiment lexicon. DEEP LEARNING IN SA Sentence-level Sentiment Clasification LM-based models: SentiBERT (Yin et al., 2020) DEEP LEARNING IN SA Sentence-level Sentiment Clasification LM-based models: SentiBERT (Yin et al., 2020) Sentiment Stanford Treebank: SST-phrase (5-class classification task on phrase-level). SST-5 (5-class classification task on sentence- level) DEEP LEARNING IN SA Aspect-based Sentiment Analysis (ABSA) It considers both the sentiment and the target information. DEEP LEARNING IN SA Aspect-based Sentiment Analysis (ABSA) It considers both the sentiment and the target information. DEEP LEARNING IN SA Aspect-based Sentiment Analysis (ABSA) Main steps in neural approaches: Representing context of a target. Generating target representations. Identifying the important context (words) for the specific target. DEEP LEARNING IN SA Aspect-based Sentiment Analysis (ABSA) The SemEval-2014 Task 4 contains two domain-specific datasets for laptops and restaurants, consisting of over 6K sentences with fine-grained aspect-level human annotations: Subtask 1: Aspect term extraction Subtask 2: Aspect term polarity (Aspect-Target Sentiment Classification, ATSC) Subtask 3: Aspect category detection Subtask 4: Aspect category polarity DEEP LEARNING IN SA Aspect-based Sentiment Analysis (ABSA) Some early works: Dong et al. (2014): They use an Adpative Recursive NN for Twitter ABSA. Similar to Socher et al. (2013), they use composition function based on parse trees to distribute the sentiment of words to aspects in the sentence. DEEP LEARNING IN SA Aspect-based Sentiment Analysis (ABSA) Some early works: Dong et al. (2014): They use an Adpative Recursive NN for Twitter ASAP. Similar to Socher et al. (2013), they use composition function based on parse trees to distribute the sentiment of words to aspects in the sentence. DEEP LEARNING IN SA Aspect-based Sentiment Analysis (ABSA) Some early works: Tang et al. (2016): They implement target-dependent LSTMs. They do not require external parse trees or lexicons. DEEP LEARNING IN SA Aspect-based Sentiment Analysis (ABSA) Some early works: Wang et al. (2016b): They propose attention-based LSTM for ABSA. They leverage also aspect embeddings. DEEP LEARNING IN SA Aspect-based Sentiment Analysis (ABSA) Some early works: Wang et al. (2016b): They propose attention-based LSTM for ABSA. They leverage also aspect embeddings. DEEP LEARNING IN SA Aspect-based Sentiment Analysis (ABSA) SOTA models: Zheng et al. (2019): A Local Context Focus (LCF) mechanism on BERT is proposed for ABSA. DEEP LEARNING IN SA Aspect-based Sentiment Analysis (ABSA) SOTA models: Zheng et al. (2019): A Local Context Focus (LCF) mechanism on BERT is proposed for ABSA. DEEP LEARNING IN SA Aspect-based Sentiment Analysis (ABSA) SOTA models: Rietzler et al. (2020): They fine-tune BERT for ABSA: Instead of [CLS] sentA [SEP] sentB [SEP], they use: [CLS] sent [SEP] aspect [SEP] DEEP LEARNING IN SA Aspect-based Sentiment Analysis (ABSA) SOTA models: Rietzler et al. (2020): They fine-tune BERT for ABSA: Instead of [CLS] sentA [SEP] sentB [SEP], they use: [CLS] sent [SEP] aspects [SEP] DEEP LEARNING IN SA Aspect-based Sentiment Analysis (ABSA) SOTA models: Rietzler et al. (2020): They fine-tune BERT for ABSA: Instead of [CLS] sentA [SEP] sentB [SEP], they use: [CLS] sent [SEP] aspects [SEP] DEEP LEARNING IN SA Aspect-based Sentiment Analysis (ABSA) SOTA models: Zheng et al. (2019) VS Rietzler et al. (2020) Targeted F1: A true positive requires the combination of exact extraction of the sentiment target, and the correct polarity. Based on that, precision and recall are computed. DEEP LEARNING IN SA Structured Sentiment Analysis (SSA) It was introduced by Barnes et al. (2021) SemEval 2022 Task 10 focused on SSA. It represents all opinion and sentiment information into a Sentiment Graph. Similar but less complete tasks: opinion role labeling or aspect sentiment triplet extraction. DEEP LEARNING IN SA Structured Sentiment Analysis (SSA) SemEval 2022 Task 10: MultiBooked is a collection of hotel reviews in Basque and Catalan, written by users and collected from booking.com. OpeNER contains an opinion mining corpus of hotel reviews for six languages. For the purposes of the shared task, they only used the English and Spanish data. MPQA annotates English news wire text. DSUnis contains English university reviews. DEEP LEARNING IN SA Structured Sentiment Analysis (SSA) Evaluation metrics: Sentiment Graph F1 (SF1): each sentiment subgraph is a tuple of (holder, target, expression, polarity). A true positive is defined as an exact match at graph-level, weighting the overlap between the predicted and gold spans for each element, averaged across all three spans (holder, target and expression). For precision, they weight the number of correctly predicted tokens divided by the total number of predicted tokens. For recall, they divide instead by the number of gold tokens. DEEP LEARNING IN SA Structured Sentiment Analysis (SSA) Evaluation metrics: Targeted F1: Metric used in ABSA, where a true positive just requires the combination of exact extraction of the sentiment target, and the correct polarity. DEEP LEARNING IN SA Structured Sentiment Analysis (SSA) Evaluation metrics: Token-level F-score for Holders, Targets and Expressions: To evaluate how well the model identifies the elements of a sentiment graph with token-level F-score. DEEP LEARNING IN SA Structured Sentiment Analysis (SSA) Barnes et al. (2021) propose to address SSA as dependency parsing. Head-first encoding. DEEP LEARNING IN SA Structured Sentiment Analysis (SSA) Barnes et al. (2021) propose to address SSA as dependency parsing. Head-final encoding. DEEP LEARNING IN SA Structured Sentiment Analysis (SSA) Barnes et al. (2021) propose to address end-to-end SSA as dependency parsing. Head-final encoding. They then apply a semantic dependency parser (Dozat and Manning, 2018). DEEP LEARNING IN SA Structured Sentiment Analysis (SSA) SOTA model by Samuel et al. (2022): It directly produces a Sentiment Graph by applying a text-to-graph PERIN model (Samuel and Straka, 2020) and language model XLM-R as encoder. REFERENCES T. Wilson, J. Wiebe, P. Hoffmann, Recognizing contextual polarity in phrase-level sentiment analysis, in: Proc. Conf. Hum. Lang. Technol. Em- pir. Methods Nat. Lang. Process. - HLT ’05, Association for Computational Linguistics, Morristown, NJ, USA, 2005, pp. 347–354, https://doi.org/10. 3115/1220575.1220619. S. Baccianella, A. Esuli, F. Sebastiani, SentiWordNet, Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining, in: Proc. Int. Conf. Lang. Resour. Eval. {LREC} 2010, 17-23 2010, European Language Resources Association,Valletta, Malta, 2010, pp. 1–5, http: //www.lrec- conf.org/proceedings/lrec2010/pdf/769_Paper.pdf. Thelwall, M., Buckley, K., Paltoglou, G. Cai, D., & Kappas, A. (2010). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12), 2544–2558. M. Taboada, J. Brooke, M. Tofiloski, K.Voll, M. Stede, Lexicon-based methods for sentiment analysis, Comput. Linguist. 37 (2011) 267–307, https://doi.org/10.1162/COLI_a_00049 S.M. Mohammad, P.D. Turney, Crowdsourcing a Word-Emotion Associa- tion Lexicon, 2013, pp. 1–25, http://arxiv.org/abs/1308.6297. Y. Hong, H. Kwak,Y. Baek, S. Moon, Tower of babel, in: Proc. 22nd Int. Conf. World Wide Web - WWW ’13 Companion, ACM Press, New York, New York, USA, 2013, pp. 549–556, https://doi.org/ 10.1145/2487788. 2487993. Y. Kim, Convolutional neural networks for sentence classification, in: Proc. 2014 Conf. Empir. Methods Nat. Lang. Process, Association for Computational Linguistics, Stroudsburg, PA, USA, 2014, pp. 1746–1751, https://doi.org/10.3115/v1/D14- 1181. Dong, L., Wei, F., Tan, C., Tang, D., Zhou, M., & Xu, K. (2014). Adaptive recursive neural network for target-dependent Twitter sentiment classification. In Proceed- ings of the Annual Meeting of the Association for Computational Linguistics (ACL 2014). Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A convolutional neural network for modelling sentences. Paper presented to the Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014. Wang, X., Liu,Y., Sun, C., Wang, B., & Wang, X. (2015). Predicting polarities of tweets by composing word embeddings with long short-term memory. In Proceed- ings of the Annual Meeting of the Association for Computational Linguistics (ACL 2015). REFERENCES Wang, X., Jiang, W., & Luo, Z. (2016). Combination of convolutional and recurrent neural network for sentiment analysis of short texts. In Proceedings of the Inter- national Conference on Computational Linguistics (COLING 2016). Wang,Y., Huang, M., Zhu, X., & Zhao, L. (2016b). Attention-based LSTM for aspect-level sentiment classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2016). Duyu Tang, Bing Qin, Xiaocheng Feng, and Ting Liu. 2016. Effective LSTMs for Target-Dependent Sentiment Classification. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 3298–3307, Osaka, Japan. The COLING 2016 Organizing Committee. B. Shin, T. Lee, J.D. Choi, Lexicon Integrated CNN Models with Attention for Sentiment Analysis, 2016, pp. 1–10, http://arxiv.org/abs/1610.06272. H. Li, J. Cui, B. Shen, J. Ma, An intelligent movie recommendation system through group-level sentiment analysis in microblogs, Neurocomputing. 210 (2016) 164–173, https://doi.org/10.1016/j.neucom.2015.09.134. L. Zhang, S. Wang, B. Liu, Deep learning for sentiment analysis: A survey, Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8 (2018) 1–25, https: //doi.org/10.1002/widm.1253. E.M. Clark, T. James, C.A. Jones, A. Alapati, P. Ukandu, C.M. Danforth, P.S. Dodds, A Sentiment Analysis of Breast Cancer Treatment Experiences and Healthcare Perceptions Across Twitter, 2018, pp. 1–17, http:// arxiv.org/ abs/1805.09959. Timothy Dozat and Christopher D. Manning. 2018. Simpler but more accurate semantic dependency parsing. In Proceedings of the 56th Annual Meet- ing of the Association for Computational Linguis- tics (Volume 2: Short Papers), pages 484–490, Mel- bourne, Australia. Association for Computational Linguistics. F. Falck, J. Marstaller, N. Stoehr, S. Maucher, J. Ren, A. Thalhammer, A. Rettinger, R. Studer, Measuring Proximity Between Newspapers and Political Parties: The Sentiment Political Compass, Policy & Internet, 2019, pp. 1–33, https://doi.org/10.1002/poi3.222. Zeng, B.;Yang, H.; Xu, R.; Zhou, W.; Han, X. LCF: A Local Context Focus Mechanism for Aspect-Based Sentiment Classification. Appl. Sci. 2019, 9, 3389. https://doi.org/10.3390/app9163389. REFERENCES E. Georgiadou, S. Angelopoulos, H. Drake, Big data analytics and inter- national negotiations: Sentiment analysis of brexit negotiating outcomes, Int. J. Inf. Manage. 51 (2020) 1–9, https://doi.org/ 10.1016/j.ijinfomgt.2019. 102048. O. Kraaijeveld, J. De Smedt, The predictive power of public Twitter sentiment for forecasting cryptocurrency prices, J. Int. Financ. Mark. Institutions Money. 65 (2020) 1–22, https://doi.org/10.1016/ j.intfin.2020. 101188. Alexander Rietzler, Sebastian Stabinger, Paul Opitz, and Stefan Engl. 2020. Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4933–4941, Marseille, France. European Language Resources Association. S. Sanagar, D. Gupta, Automated genre-based multi-domain sentiment lexicon adaptation using unlabeled data, J. Intell. Fuzzy Syst. 38 (2020) 6223–6234, https://doi.org/10.3233/JIFS-179704. Wei Li, Luyao Zhu,Yong Shi, Kun Guo, Erik Cambria, User reviews: Sentiment analysis using lexicon integrated two-channel CNN–LSTM family models, Applied Soft Computing,Volume 94,2020 Yin, D., Meng, T. & Chang, K.-W. Sentibert a transferable transformer-based architecture for compositional sentiment semantics (2020). D. Samuel and M. Straka, “ ́UFAL at MRP 2020: Permutation-invariant semantic parsing in PERIN,” in Proceedings of the CoNLL 2020 Shared Task: Cross-Framework Meaning Representation Parsing. Online: Association for Computational Linguistics, Nov. 2020, pp. 53–64. [Online]. Available: https://aclanthology.org/2020.conll-shared.5 Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Øvrelid, and Erik Velldal. 2021. Structured Sentiment Analysis as Dependency Graph Parsing. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3387–3402, Online. Association for Computational Linguistics. David Samuel, Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Øvrelid, and Erik Velldal. 2022. Direct parsing to sentiment graphs. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 470–478, Dublin, Ireland. Association for Computational Linguistics.