Intelligent Systems & Techniques (IS4242) Lecture Notes - Electronic Word of Mouth
Document Details
Uploaded by DexterousFern6890
National University of Singapore
NUS
Aditya Karanam
Tags
Summary
These lecture notes cover the topic of Electronic Word of Mouth (eWOM) and its applications to stock price prediction. The notes discuss the motivations for eWOM, how these mechanisms can be used, and then explore the relationship to stock price. The notes discuss the use of machine learning models and techniques in this area.
Full Transcript
IS4242 INTELLIGENT SYSTEMS & TECHNIQUES L10 – Electronic Word of Mouth Aditya Karanam © Copyright National University of Singapore. All Ri...
IS4242 INTELLIGENT SYSTEMS & TECHNIQUES L10 – Electronic Word of Mouth Aditya Karanam © Copyright National University of Singapore. All Rights Reserved. Announcements ▸ Project Check-in ‣ The deadline is in the Week – 12 IS4242 (Aditya Karanam) 2 In this Lecture… ▸ What is Word of Mouth(WOM) and Electronic Word of Mouth (eWOM)? ▸ Why do consumers engage in eWOM? ‣ Motivations and persuasive power ▸ How to manage the eWOM? ▸ Examining the impact of eWOM Sentiment on Stock Prices ‣ Brief Overview of Word Embeddings ‣ Intuition: Bidirectional Long Short-Term Memory Networks IS4242 (Aditya Karanam) 3 Word of Mouth ▸ Every day, we make several conversations actively seeking product-related inputs from family, friends, and even virtual strangers ‣ Over 2.4 billion conversations each day involve brands ▸ These conversations provide marketing for the brands and is called Word of Mouth Marketing or simply Word-of-mouth ▸ Considered to be the most effective among various forms of marketing ‣ 92% of consumers trust recommendations from family and friends more than any other type of advertising (Nielsen 2012) IS4242 (Aditya Karanam) 4 Word of Mouth: Definition ▸ Definition: Oral, person-to-person communication between a receiver and a communicator whom the receiver perceives as non-commercial, concerning a brand, a product, or a service. ‣ WOM is interpersonal communication, thus different from mass communication, such as advertising and other impersonal channels ‣ The content of these communications should be commercial ‣ Individuals use WOM for all kinds of communication, WOM in marketing refers to the message about commercial entities (products, brands, etc.) ‣ Though the content of communications is commercial, communicators are not motivated commercially (at least in the receiver’s eye) ‣ WOM is commercial in content but non-commercial in perception IS4242 (Aditya Karanam) 5 Online is seamlessly integrated into our offline lives! ▸ In today’s world, there is no offline or online socializing – it is simply socializing! ▸ In 2020, TikTok users watched 167 million videos, Instagram users shared 65,000 photos, YouTube users streamed 694,000 hours, and 2 million Snapchats were sent per minute (Domo 2021)! ▸ Firms are also blending their offline and online communications ‣ E.g., TV and print ads are asking consumers to go online, like, follow or subscribe to their Facebook pages IS4242 (Aditya Karanam) 6 Electronic Word of Mouth (eWOM) ▸ Definition: the dynamic and ongoing information exchange process between consumers (potential, actual, or former) regarding a product, service, brand, or company, which is available to a multitude of people and institutions ▸ eWOM communication is not a static process but a dynamic and ongoing information exchange ‣ As messages can spread online spontaneously ▸ Does not emphasize that the source of the communications is perceived as non-commercial by the receivers IS4242 (Aditya Karanam) 7 Electronic Word of Mouth: Examples ▸ The power of eWOM is in making products or product promotions go viral ‣ This can have a positive or negative impact on sales ▸ Meme stocks are a prime example of eWOM ‣ Stock prices are driven by viral posts on social media ‣ Example: Gamestop IS4242 (Aditya Karanam) 8 Word of Mouth Characteristics ▸ What motivates consumers to produce word of mouth? ‣ Altruism ‣ Social benefits ‣ Economic incentives ▸ What makes eWOM so persuavise? ‣ Credibility of the sender, Helpfulness of the content, etc. ▸ What are the firm’s responsive strategies? IS4242 (Aditya Karanam) 9 Motivations to produce eWoM: Altruism Altruism: Improving the welfare of one or more people other than oneself ▸ People with altruistic motives volunteer themselves to share eWOM to help other customers ‣ E.g.: People share reviews just because it helps others on platforms such as Amazon, Yelp, Google, etc. ▸ Connected to the feeling of pleasure obtained from helping or the empathy an individual feels toward others IS4242 (Aditya Karanam) 10 Motivations to produce eWoM: Social Benefits ▸ Consumers engage in eWOM to participate and be part of the virtual community ▸ Affiliation with a virtual community provides a social benefit to an individual through identification and social integration ‣ E.g.: Yelp Elite Squad, Destination Expert in TripAdvisor, Moderator in Reddit, Stack Exchange, etc. IS4242 (Aditya Karanam) 11 Motivations to produce eWoM: Economic Incentives ▸ Consumers also engage in eWOM to get some economic incentive ▸ These can be in the form of web points or coupons provided through opinion platforms in exchange for eWOM ‣ E.g.: Amazon Vine program for reviewers, higher chances of recruitment based on the status in Stack Exchange, GitHub contributions, etc. ▸ Economic incentive motivation is the distinctive characteristic of eWOM ‣ As compared to traditional WOM, eWOM is exchanged through third- party businesses, which have become highly successful ‣ E.g.: Review aggregator websites such as Epinions.com, Rotten Tomatoes, etc. IS4242 (Aditya Karanam) 12 The Persuasiveness of eWoM ▸ Credibility: the degree to which an individual perceives the recommendation from others as believable or factual ▸ Helpfulness (or usefulness): the degree to which the information assists consumers in making their purchase decisions ▸ If people believe that the received information is credible and helpful ‣ more confident in using eWOM information in making purchase decisions IS4242 (Aditya Karanam) 13 The Persuasiveness of eWoM ▸ The perceived credibility and helpfulness is influenced by several factors: ‣ Quality of the content ‣ Average rating, length of the review, consistency across recommendations, etc. ‣ Characteristics of information source ‣ Expertise, Trustworthiness, Attractiveness, etc. ▸ Vendors with low-quality products are found to manipulate their reviews to impact consumer decisions ‣ E.g.: Reddit Marketing Pro sells fake reviews, backlinks, Reddit upvotes, etc. IS4242 (Aditya Karanam) https://reddit-marketing.pro/ 14 The Persuasiveness of eWoM ▸ Emotional content, especially Negativity, spreads faster online ‣ People tend to latch on to negative information more than positive ‣ E.g.: the Facebook mood manipulation experiment ▸ News organizations use negative news to improve engagement ‣ Political affiliation of a user is found through their hate to other groups rather than their strong affiliation to their own group IS4242 (Aditya Karanam) https://www.hsph.harvard.edu/chc/2023/05/01/negative-news-headline-more-clicks/ 15 The Persuasiveness of eWoM ▸ On Twitter, Fake news is found to spread faster and farther than the truth since it is more novel and sensational! ▸ This problem is exasperated due to deep fakes! ‣ E.g.: Fake Trump arrests IS4242 (Aditya Karanam) https://www.bbc.com/news/world-us-canada-65069316 16 Managing eWOM ▸ Consumers express their complaints and negative experiences to a multitude of people with reduced time and cost over the internet ▸ Negative eWOM can lead to a negative perception of the brand or company ‣ Also, the negativity spreads faster! ▸ Monitoring eWOM is of utmost priority for all companies ‣ Companies should capture and analyse all the reviews, opinions, etc. IS4242 (Aditya Karanam) 17 Social Media Monitoring: Example ▸ Analyze the opinion or sentiment of a group (e.g., by region) ‣ Sentiment analysis on all social media mentions to your brand and categorize by urgency ‣ Prioritize action for brand management ▸ Track sentiment trends of your brand and competition IS4242 (Aditya Karanam) 18 Analyzing the impact of eWOM Sentiment on Stock Price © Copyright National University of Singapore. All Rights Reserved. 19 Task: Sentiment Analysis and Stock Prediction ▸ eWOM is unstructured (textual, images, videos, etc.) data ‣ We use textual data (tweets) for this task ▸ Consumers may provide several opinions, such as sentiment, sarcasm, suggestions, etc., in their reviews or tweets of a company or product ▸ Objective: Extract consumer sentiment and use it to predict the stock price ‣ Two Tasks: ‣ Identifying sentiment (Classification) - Natural Language Processing problem ‣ Predicting the stocks (regression) IS4242 (Aditya Karanam) 20 Natural Language Processing (NLP) ▸ Language Modeling ▸ Information Extraction ▸ Summarization ▸ Machine Translation ▸ Dialog Systems ▸ Opinion Mining ‣ Sentiment Analysis ‣ Polarity: positive/neutral/negative opinion IS4242 (Aditya Karanam) 21 Opinion ▸ Text: facts and opinions ‣ Opinion: subjective expressions about a topic ‣ Polarity: positive, negative, or neutral opinion ‣ “The earphone is of bad quality; it broke in two days” ‣ “I am so happy to see you” ‣ “The company’s stock price is above its fundamental value” ‣ “The battery life of this camera is too short” ‣ “Earth is an oblate spheroid” IS4242 (Aditya Karanam) 22 Data ▸ Tweets on stock prices from Stocktwits.com ‣ 80,793 valid stock microblog postings ‣ 25 unique stocks, including Tesla, Apple, Microsoft etc. ▸ Stock Price Data from Yahoo! Finance for the period spanning from Oct 04, 2021, to Sept. 26, 2022, covering a total of 51 weeks. ▸ Labeled (by hand) data of 1300 tweets between April 9, 2020, and July 16, 2020, as positive, negative, or neutral for building the model ‣ Multi-label Classification Problem IS4242 (Aditya Karanam) 23 Textual Data ▸ Words are the fundamental units of text (in English) ‣ They obtain meaning from the surrounding words ‣ River bank vs Federal bank ▸ Two challenges in performing the task: ‣ Represent words as numeric vectors that encode their meaning ‣ Designing the classifier that encodes sequential dependency across words IS4242 (Aditya Karanam) 24 Textual Data: Representation ▸ Need to represent words as numeric vectors to be used as features in machine- learning models ▸ Vector representations need to have two properties: ‣ ‘Dense’ to help in learning ‣ ‘Semantically useful’ - capture the meaning of the words ‣ Ex: Great, enjoyable should have similar representations ▸ Feature representations: ‣ Bag of Words, TF-IDF, and Word Embeddings IS4242 (Aditya Karanam) 25 Document Term Matrix ▸ Document: A set of words ‣ Depends on what is to be analyzed (sentence, tweet, paragraph, report…) ▸ Term: Obtained after preprocessing (stemming, lemmatization, etc.) ‣ Can be words, bigrams, n-grams, etc. ▸ N documents, V terms in the vocabulary ‣ Document-term matrix: N rows, V columns Term 1 Term 2 … Term V Document 1 Document 2 … Document N IS4242 (Aditya Karanam) 26 Bag of Words (BoW) Model: Example ▸ Corpus has 2 documents: ‣ Document 1: “I like tomatoes more than apples.” ‣ Document 2: “I like reading. I like apples.” ▸ Vocabulary: 7 words: {I, like, tomatoes, more, than, apples, reading} ▸ BoW Representation using counts: I like reading tomatoes apples more than Document 1 1 1 0 1 1 1 1 Document 2 2 2 1 0 1 0 0 IS4242 (Aditya Karanam) 27 Term Frequency – Inverse Document Frequency (TF-IDF) 𝑐𝑡,𝑑 ▸ Term Frequency: 𝑡𝑓 𝑡, 𝑑 = σ Notation: 𝑐 𝑡′ ∈𝑑 𝑡′ ,𝑑 𝑡: term, 𝑑: document, – 𝑐𝑡,𝑑 : count of term 𝑡 in document 𝑑 𝐷: corpus (N documents) 𝑁 ▸ Inverse Document Frequency: 𝑖𝑑𝑓 𝑡, 𝐷 = 𝑙𝑜𝑔 1+| 𝑑∈𝐷:𝑡∈𝑑 | ▸ TF-IDF: 𝑡𝑓 − 𝑖𝑑𝑓 𝑡, 𝑑, 𝐷 = 𝑡𝑓 𝑡, 𝑑 × 𝑖𝑑𝑓(𝑡, 𝐷) ▸ 𝑖𝑑𝑓(𝑡, 𝐷) is a weight factor ‣ More frequent terms → lower weight, Less frequent terms → higher weight ‣ If a term occurs a lot may not be that useful (ex: stop words) IS4242 (Aditya Karanam) 28 TF –IDF: Toy Example Counts Assume: This work example … N = 10, total words in D1, D2 = 100 D1 1 3 6 1 “this” appears in 8 documents D2 2 0 1.. “work” appears in 1 document ▸ tf(“this”, D1) = 1/100, tf(“this”, D2) = 2/100 ▸ idf(“this”, corpus) = log (10/9) = 0.04 ▸ tf-idf(“this”, D1, corpus) = 0.04 x 0.01, tf-idf (“this”, D2, corpus) = 0.04 x 0.02 ▸ tf(“work”, D1) = 3/100, tf(“work”, D2) = 0 ( “work” appears in 1 document) ▸ idf(“work”, corpus) = log (10/2) = 0.7 ▸ tf-idf(“work”, D1, corpus) = 0.7 x 0.03, tf-idf (“work”, D2, corpus) = 0 IS4242 (Aditya Karanam) 29 TF –IDF: Toy Example TF-IDF Assume: This work example … V = 100, N = 10, total words in D1, D2 = 100 D1 0.0004 0.021 1 “this” appears in 8 documents D2 0.0008 0.. “work” appears in 1 document ▸ tf(“this”, D1) = 1/100, tf(“this”, D2) = 2/100 ▸ idf(“this”, corpus) = log 10/9 = 0.04 ▸ tf-idf(“this”, D1, corpus) = 0.04 x 0.01, tf-idf (“this”, D2, corpus) = 0.04 x 0.02 ▸ tf(“work”, D1) = 3/100, tf(“work”, D2) = 0 ( “work” appears in 1 document) ▸ idf(“work”, corpus) = log 10/2 = 0.7 ▸ tf-idf(“work”, D1, corpus) = 0.7 x 0.03, tf-idf (“work”, D2, corpus) = 0 IS4242 (Aditya Karanam) 30 Problems with this Representation ▸ Synonymy ‣ Different terms - Same meaning ‣ ‘comical’ & ‘hilarious’ will be different terms ▸ Document-Term Matrix ‣ High Dimensional: Vocabulary size → thousands ‣ Sparse: Mostly zeros IS4242 (Aditya Karanam) 31 Word2Vec ▸ Dense representation based on Neural Networks ▸ Lower dimensional embeddings (k < V) ▸ Easy to learn ▸ Generalizes better: ‣ Empirically gives better results ‣ Easy to incorporate new words IS4242 (Aditya Karanam) 32 Word2Vec: Key Idea ▸ Brilliant use of a predictive model to inject “intelligence” (or “meaning”) into the representations ▸ Predict (don’t count) the “context” of a word from a text corpus ‣ Context: nearby/surrounding words ▸ Classification problem: predict a context word given the focal word ‣ We don’t care about prediction task directly ‣ But the weights learned in the network IS4242 (Aditya Karanam) 33 Word2Vec ▸ 1. Generate Training Data for Classification » ▸ 2. Train Network ▸ 3. Use Weight as Embeddings ▸ A brief overview of Skip-Gram Negative Sampling (SGNS) architecture IS4242 (Aditya Karanam) 34 1. Generating the Training Data ▸ Take a large corpus like Wikipedia ▸ Positive samples: Generate (word, context) pairs for some window length(s) as Window of size 3 Window of size 4 Window of size 5 ▸ Negative samples: Random word pairs not present in a window length of each other IS4242 (Aditya Karanam) 35 2. Train a Network ▸ Input: one-hot encoding of a word ▸ Outputs: C context words (one-hot encoded) ▸ Single hidden layer ▸ Example: The quick brown fox jumps ‣ Input: [0 0 0 1 0] ‣ Hidden layer: 5x300 (300 neurons in hidden layer) ‣ Output: C=4 word predictions (window size of 5) IS4242 (Aditya Karanam) 36 2. Train a Network ▸ The weight vector W acts like a lookup table for the input word ‣ Ex: ▸ The output layer computes the probability of the context word (softmax) IS4242 (Aditya Karanam) 37 3. Embeddings ▸ Use W (or various combinations of W and W’) as embeddings ▸ Why does it work? ‣ If two words have similar meanings, their contexts will be similar ‣ That, in turn, will make their embeddings in the weight matrix similar ▸ These embeddings seem to learn the underlying meanings of words! ‣ King – Man + Woman = Queen ▸ Use these feature representations for classification using a neural network ▸ Gensim library in python produces efficient embeddings ‣ Google has also released embeddings of large corpora known as Glove embeddings (~6 Billion words) IS4242 (Aditya Karanam) 38 What Classification Model Should We Use? ▸ Feedforward NNs : each position in the feature vector has fixed semantics ‣ The word ‘great’ is treated as if it is un-related to the word ‘movie’ ▸ Instead, we need to: ‣ Process each word by exploiting the context (surrounding words) of each token IS4242 (Aditya Karanam) 40 RNN Abstraction ▸ RNNs pass information between cells in the same layer to maintain context ‣ They take input x and the previous hidden state, and produce the next hidden state, and output y ‣ Helps in capturing dependencies in sequential data such as text, time series, etc. 𝜙: Activation function (Tanh); 𝑊, 𝑈: weights IS4242 (Aditya Karanam) 41 RNN: The Problem of Vanishing Gradients ▸ Textual data can be long (~60 words in a sentence) ‣ During backpropagation over long sequences, gradients of the later words may quickly go to zero → challenging to capture long-range dependencies. ‣ E.g.: Tanh: gradient is almost zero if x not in [-2, 2] ▸ Known as the problem of vanishing gradients IS4242 (Aditya Karanam) 42 Solution: Long Short-Term Memory (LSTM) Cells ▸ LSTMs: RNNs with Long Term Memory ‣ Maintain two cell states (memory): long-term and the short-term ▸ Long term state remembers or preserves the gradient ‣ Achieved through the clever arrangement of activation functions ▸ Stack as many LSTM cells as you like (just as neurons) IS4242 (Aditya Karanam) 43 Bidirectional LSTM (BILSTM) ▸ Consider the sentence of 𝑛 words: 𝑤1 , 𝑤2 , 𝑤3, … , 𝑤𝑛 ▸ Forward LSTM: 𝑤𝑖 state is computed using the words 𝑤1:𝑖 (ignores 𝑤𝑖+1:𝑛 ) ▸ Backward LSTM: 𝑤𝑖 state is computed using the words 𝑤𝑖:𝑛 (ignores 𝑤1:𝑖−1) ▸ Stack forward and backward LSTM together to capture context from all the words IS4242 (Aditya Karanam) Pooling (max, avg) or Concatenation 44 Stock Price Prediction ▸ 1. Word embeddings from Google Word vector data trained on 6 Billion words – ▸ 2. Train the BiLSTM model with labeled tweet data ▸ 3. Predict the sentiment for all tweets data: ~80,000 sentences ‣ 3.1 Average sentiment across tweets for a given stock on a given day ▸ 4. Regress it over the stock price with other control variables ‣ Check the co-efficient and significance of the sentiment IS4242 (Aditya Karanam) 45 Stock Price Prediction ▸ Regression results for full data: Sentiment is significant and positive IS4242 (Aditya Karanam) 46 Stock Price Prediction 1+𝐶𝑜𝑢𝑛𝑡 𝑜𝑓 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑤𝑒𝑒𝑡𝑠 ▸ Stock Bullish score: log : Bullish if score >0, else Bearish 1+𝑐𝑜𝑢𝑛𝑡 𝑜𝑓 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑡𝑤𝑒𝑒 eWOM has higher predictive power for bearish stocks Bullish Stocks Bearish Stocks IS4242 (Aditya Karanam) 47 References ▸ Word-of-mouth: ‣ Elvira Ismagilova, Yogesh K. Dwivedi, Emma Slade, Michael D. Williams, (2017), Electronic Word of Mouth (eWOM) in the Marketing Context. ▸ Word Embeddings and BiLSTMs: ‣ http://colah.github.io/posts/2015-08-Understanding-LSTMs/ ‣ https://www.cs.utexas.edu/~gdurrett/courses/fa2019/cs388.shtml IS4242 (Aditya Karanam) 48 Additional: Current State Of The Art in NLP ▸ BiLSTMs were state of the art in some contexts until GPT-3.5 ▸ Series of innovations replaces LSTMs ‣ Initially, LSTMS were combined with attention layers ‣ Then, just the attention layers – transformers: Attention is all you need! ‣ This led to the emergence of Large language models ‣ GPT-3, BERT, Chat GPT (GPT 3.5 and GPT 4), etc. IS4242 (Aditya Karanam) 49 Thank You © Copyright National University of Singapore. All Rights Reserved.