Intelligent Systems & Techniques (IS4242) Lecture Notes - Electronic Word of Mouth

IS4242 INTELLIGENT SYSTEMS & TECHNIQUES L10 – Electronic Word of Mouth Aditya Karanam © Copyright National University of Singapore. All Ri...

IS4242 INTELLIGENT SYSTEMS & TECHNIQUES L10 – Electronic Word of Mouth Aditya Karanam © Copyright National University of Singapore. All Rights Reserved. Announcements ▸ Project Check-in ‣ The deadline is in the Week – 12 IS4242 (Aditya Karanam) 2 In this Lecture… ▸ What is Word of Mouth(WOM) and Electronic Word of Mouth (eWOM)? ▸ Why do consumers engage in eWOM? ‣ Motivations and persuasive power ▸ How to manage the eWOM? ▸ Examining the impact of eWOM Sentiment on Stock Prices ‣ Brief Overview of Word Embeddings ‣ Intuition: Bidirectional Long Short-Term Memory Networks IS4242 (Aditya Karanam) 3 Word of Mouth ▸ Every day, we make several conversations actively seeking product-related inputs from family, friends, and even virtual strangers ‣ Over 2.4 billion conversations each day involve brands ▸ These conversations provide marketing for the brands and is called Word of Mouth Marketing or simply Word-of-mouth ▸ Considered to be the most effective among various forms of marketing ‣ 92% of consumers trust recommendations from family and friends more than any other type of advertising (Nielsen 2012) IS4242 (Aditya Karanam) 4 Word of Mouth: Definition ▸ Definition: Oral, person-to-person communication between a receiver and a communicator whom the receiver perceives as non-commercial, concerning a brand, a product, or a service. ‣ WOM is interpersonal communication, thus different from mass communication, such as advertising and other impersonal channels ‣ The content of these communications should be commercial ‣ Individuals use WOM for all kinds of communication, WOM in marketing refers to the message about commercial entities (products, brands, etc.) ‣ Though the content of communications is commercial, communicators are not motivated commercially (at least in the receiver’s eye) ‣ WOM is commercial in content but non-commercial in perception IS4242 (Aditya Karanam) 5 Online is seamlessly integrated into our offline lives! ▸ In today’s world, there is no offline or online socializing – it is simply socializing! ▸ In 2020, TikTok users watched 167 million videos, Instagram users shared 65,000 photos, YouTube users streamed 694,000 hours, and 2 million Snapchats were sent per minute (Domo 2021)! ▸ Firms are also blending their offline and online communications ‣ E.g., TV and print ads are asking consumers to go online, like, follow or subscribe to their Facebook pages IS4242 (Aditya Karanam) 6 Electronic Word of Mouth (eWOM) ▸ Definition: the dynamic and ongoing information exchange process between consumers (potential, actual, or former) regarding a product, service, brand, or company, which is available to a multitude of people and institutions ▸ eWOM communication is not a static process but a dynamic and ongoing information exchange ‣ As messages can spread online spontaneously ▸ Does not emphasize that the source of the communications is perceived as non-commercial by the receivers IS4242 (Aditya Karanam) 7 Electronic Word of Mouth: Examples ▸ The power of eWOM is in making products or product promotions go viral ‣ This can have a positive or negative impact on sales ▸ Meme stocks are a prime example of eWOM ‣ Stock prices are driven by viral posts on social media ‣ Example: Gamestop IS4242 (Aditya Karanam) 8 Word of Mouth Characteristics ▸ What motivates consumers to produce word of mouth? ‣ Altruism ‣ Social benefits ‣ Economic incentives ▸ What makes eWOM so persuavise? ‣ Credibility of the sender, Helpfulness of the content, etc. ▸ What are the firm’s responsive strategies? IS4242 (Aditya Karanam) 9 Motivations to produce eWoM: Altruism Altruism: Improving the welfare of one or more people other than oneself ▸ People with altruistic motives volunteer themselves to share eWOM to help other customers ‣ E.g.: People share reviews just because it helps others on platforms such as Amazon, Yelp, Google, etc. ▸ Connected to the feeling of pleasure obtained from helping or the empathy an individual feels toward others IS4242 (Aditya Karanam) 10 Motivations to produce eWoM: Social Benefits ▸ Consumers engage in eWOM to participate and be part of the virtual community ▸ Affiliation with a virtual community provides a social benefit to an individual through identification and social integration ‣ E.g.: Yelp Elite Squad, Destination Expert in TripAdvisor, Moderator in Reddit, Stack Exchange, etc. IS4242 (Aditya Karanam) 11 Motivations to produce eWoM: Economic Incentives ▸ Consumers also engage in eWOM to get some economic incentive ▸ These can be in the form of web points or coupons provided through opinion platforms in exchange for eWOM ‣ E.g.: Amazon Vine program for reviewers, higher chances of recruitment based on the status in Stack Exchange, GitHub contributions, etc. ▸ Economic incentive motivation is the distinctive characteristic of eWOM ‣ As compared to traditional WOM, eWOM is exchanged through third- party businesses, which have become highly successful ‣ E.g.: Review aggregator websites such as Epinions.com, Rotten Tomatoes, etc. IS4242 (Aditya Karanam) 12 The Persuasiveness of eWoM ▸ Credibility: the degree to which an individual perceives the recommendation from others as believable or factual ▸ Helpfulness (or usefulness): the degree to which the information assists consumers in making their purchase decisions ▸ If people believe that the received information is credible and helpful ‣ more confident in using eWOM information in making purchase decisions IS4242 (Aditya Karanam) 13 The Persuasiveness of eWoM ▸ The perceived credibility and helpfulness is influenced by several factors: ‣ Quality of the content ‣ Average rating, length of the review, consistency across recommendations, etc. ‣ Characteristics of information source ‣ Expertise, Trustworthiness, Attractiveness, etc. ▸ Vendors with low-quality products are found to manipulate their reviews to impact consumer decisions ‣ E.g.: Reddit Marketing Pro sells fake reviews, backlinks, Reddit upvotes, etc. IS4242 (Aditya Karanam) https://reddit-marketing.pro/ 14 The Persuasiveness of eWoM ▸ Emotional content, especially Negativity, spreads faster online ‣ People tend to latch on to negative information more than positive ‣ E.g.: the Facebook mood manipulation experiment ▸ News organizations use negative news to improve engagement ‣ Political affiliation of a user is found through their hate to other groups rather than their strong affiliation to their own group IS4242 (Aditya Karanam) https://www.hsph.harvard.edu/chc/2023/05/01/negative-news-headline-more-clicks/ 15 The Persuasiveness of eWoM ▸ On Twitter, Fake news is found to spread faster and farther than the truth since it is more novel and sensational! ▸ This problem is exasperated due to deep fakes! ‣ E.g.: Fake Trump arrests IS4242 (Aditya Karanam) https://www.bbc.com/news/world-us-canada-65069316 16 Managing eWOM ▸ Consumers express their complaints and negative experiences to a multitude of people with reduced time and cost over the internet ▸ Negative eWOM can lead to a negative perception of the brand or company ‣ Also, the negativity spreads faster! ▸ Monitoring eWOM is of utmost priority for all companies ‣ Companies should capture and analyse all the reviews, opinions, etc. IS4242 (Aditya Karanam) 17 Social Media Monitoring: Example ▸ Analyze the opinion or sentiment of a group (e.g., by region) ‣ Sentiment analysis on all social media mentions to your brand and categorize by urgency ‣ Prioritize action for brand management ▸ Track sentiment trends of your brand and competition IS4242 (Aditya Karanam) 18 Analyzing the impact of eWOM Sentiment on Stock Price © Copyright National University of Singapore. All Rights Reserved. 19 Task: Sentiment Analysis and Stock Prediction ▸ eWOM is unstructured (textual, images, videos, etc.) data ‣ We use textual data (tweets) for this task ▸ Consumers may provide several opinions, such as sentiment, sarcasm, suggestions, etc., in their reviews or tweets of a company or product ▸ Objective: Extract consumer sentiment and use it to predict the stock price ‣ Two Tasks: ‣ Identifying sentiment (Classification) - Natural Language Processing problem ‣ Predicting the stocks (regression) IS4242 (Aditya Karanam) 20 Natural Language Processing (NLP) ▸ Language Modeling ▸ Information Extraction ▸ Summarization ▸ Machine Translation ▸ Dialog Systems ▸ Opinion Mining ‣ Sentiment Analysis ‣ Polarity: positive/neutral/negative opinion IS4242 (Aditya Karanam) 21 Opinion ▸ Text: facts and opinions ‣ Opinion: subjective expressions about a topic ‣ Polarity: positive, negative, or neutral opinion ‣ “The earphone is of bad quality; it broke in two days” ‣ “I am so happy to see you” ‣ “The company’s stock price is above its fundamental value” ‣ “The battery life of this camera is too short” ‣ “Earth is an oblate spheroid” IS4242 (Aditya Karanam) 22 Data ▸ Tweets on stock prices from Stocktwits.com ‣ 80,793 valid stock microblog postings ‣ 25 unique stocks, including Tesla, Apple, Microsoft etc. ▸ Stock Price Data from Yahoo! Finance for the period spanning from Oct 04, 2021, to Sept. 26, 2022, covering a total of 51 weeks. ▸ Labeled (by hand) data of 1300 tweets between April 9, 2020, and July 16, 2020, as positive, negative, or neutral for building the model ‣ Multi-label Classification Problem IS4242 (Aditya Karanam) 23 Textual Data ▸ Words are the fundamental units of text (in English) ‣ They obtain meaning from the surrounding words ‣ River bank vs Federal bank ▸ Two challenges in performing the task: ‣ Represent words as numeric vectors that encode their meaning ‣ Designing the classifier that encodes sequential dependency across words IS4242 (Aditya Karanam) 24 Textual Data: Representation ▸ Need to represent words as numeric vectors to be used as features in machine- learning models ▸ Vector representations need to have two properties: ‣ ‘Dense’ to help in learning ‣ ‘Semantically useful’ - capture the meaning of the words ‣ Ex: Great, enjoyable should have similar representations ▸ Feature representations: ‣ Bag of Words, TF-IDF, and Word Embeddings IS4242 (Aditya Karanam) 25 Document Term Matrix ▸ Document: A set of words ‣ Depends on what is to be analyzed (sentence, tweet, paragraph, report…) ▸ Term: Obtained after preprocessing (stemming, lemmatization, etc.) ‣ Can be words, bigrams, n-grams, etc. ▸ N documents, V terms in the vocabulary ‣ Document-term matrix: N rows, V columns Term 1 Term 2 … Term V Document 1 Document 2 … Document N IS4242 (Aditya Karanam) 26 Bag of Words (BoW) Model: Example ▸ Corpus has 2 documents: ‣ Document 1: “I like tomatoes more than apples.” ‣ Document 2: “I like reading. I like apples.” ▸ Vocabulary: 7 words: {I, like, tomatoes, more, than, apples, reading} ▸ BoW Representation using counts: I like reading tomatoes apples more than Document 1 1 1 0 1 1 1 1 Document 2 2 2 1 0 1 0 0 IS4242 (Aditya Karanam) 27 Term Frequency – Inverse Document Frequency (TF-IDF) 𝑐𝑡,𝑑 ▸ Term Frequency: 𝑡𝑓 𝑡, 𝑑 = σ Notation: 𝑐 𝑡′ ∈𝑑 𝑡′ ,𝑑 𝑡: term, 𝑑: document, – 𝑐𝑡,𝑑 : count of term 𝑡 in document 𝑑 𝐷: corpus (N documents) 𝑁 ▸ Inverse Document Frequency: 𝑖𝑑𝑓 𝑡, 𝐷 = 𝑙𝑜𝑔 1+| 𝑑∈𝐷:𝑡∈𝑑 | ▸ TF-IDF: 𝑡𝑓 − 𝑖𝑑𝑓 𝑡, 𝑑, 𝐷 = 𝑡𝑓 𝑡, 𝑑 × 𝑖𝑑𝑓(𝑡, 𝐷) ▸ 𝑖𝑑𝑓(𝑡, 𝐷) is a weight factor ‣ More frequent terms → lower weight, Less frequent terms → higher weight ‣ If a term occurs a lot may not be that useful (ex: stop words) IS4242 (Aditya Karanam) 28 TF –IDF: Toy Example Counts Assume: This work example … N = 10, total words in D1, D2 = 100 D1 1 3 6 1 “this” appears in 8 documents D2 2 0 1.. “work” appears in 1 document ▸ tf(“this”, D1) = 1/100, tf(“this”, D2) = 2/100 ▸ idf(“this”, corpus) = log (10/9) = 0.04 ▸ tf-idf(“this”, D1, corpus) = 0.04 x 0.01, tf-idf (“this”, D2, corpus) = 0.04 x 0.02 ▸ tf(“work”, D1) = 3/100, tf(“work”, D2) = 0 ( “work” appears in 1 document) ▸ idf(“work”, corpus) = log (10/2) = 0.7 ▸ tf-idf(“work”, D1, corpus) = 0.7 x 0.03, tf-idf (“work”, D2, corpus) = 0 IS4242 (Aditya Karanam) 29 TF –IDF: Toy Example TF-IDF Assume: This work example … V = 100, N = 10, total words in D1, D2 = 100 D1 0.0004 0.021 1 “this” appears in 8 documents D2 0.0008 0.. “work” appears in 1 document ▸ tf(“this”, D1) = 1/100, tf(“this”, D2) = 2/100 ▸ idf(“this”, corpus) = log 10/9 = 0.04 ▸ tf-idf(“this”, D1, corpus) = 0.04 x 0.01, tf-idf (“this”, D2, corpus) = 0.04 x 0.02 ▸ tf(“work”, D1) = 3/100, tf(“work”, D2) = 0 ( “work” appears in 1 document) ▸ idf(“work”, corpus) = log 10/2 = 0.7 ▸ tf-idf(“work”, D1, corpus) = 0.7 x 0.03, tf-idf (“work”, D2, corpus) = 0 IS4242 (Aditya Karanam) 30 Problems with this Representation ▸ Synonymy ‣ Different terms - Same meaning ‣ ‘comical’ & ‘hilarious’ will be different terms ▸ Document-Term Matrix ‣ High Dimensional: Vocabulary size → thousands ‣ Sparse: Mostly zeros IS4242 (Aditya Karanam) 31 Word2Vec ▸ Dense representation based on Neural Networks ▸ Lower dimensional embeddings (k < V) ▸ Easy to learn ▸ Generalizes better: ‣ Empirically gives better results ‣ Easy to incorporate new words IS4242 (Aditya Karanam) 32 Word2Vec: Key Idea ▸ Brilliant use of a predictive model to inject “intelligence” (or “meaning”) into the representations ▸ Predict (don’t count) the “context” of a word from a text corpus ‣ Context: nearby/surrounding words ▸ Classification problem: predict a context word given the focal word ‣ We don’t care about prediction task directly ‣ But the weights learned in the network IS4242 (Aditya Karanam) 33 Word2Vec ▸ 1. Generate Training Data for Classification » ▸ 2. Train Network ▸ 3. Use Weight as Embeddings ▸ A brief overview of Skip-Gram Negative Sampling (SGNS) architecture IS4242 (Aditya Karanam) 34 1. Generating the Training Data ▸ Take a large corpus like Wikipedia ▸ Positive samples: Generate (word, context) pairs for some window length(s) as Window of size 3 Window of size 4 Window of size 5 ▸ Negative samples: Random word pairs not present in a window length of each other IS4242 (Aditya Karanam) 35 2. Train a Network ▸ Input: one-hot encoding of a word ▸ Outputs: C context words (one-hot encoded) ▸ Single hidden layer ▸ Example: The quick brown fox jumps ‣ Input: [0 0 0 1 0] ‣ Hidden layer: 5x300 (300 neurons in hidden layer) ‣ Output: C=4 word predictions (window size of 5) IS4242 (Aditya Karanam) 36 2. Train a Network ▸ The weight vector W acts like a lookup table for the input word ‣ Ex: ▸ The output layer computes the probability of the context word (softmax) IS4242 (Aditya Karanam) 37 3. Embeddings ▸ Use W (or various combinations of W and W’) as embeddings ▸ Why does it work? ‣ If two words have similar meanings, their contexts will be similar ‣ That, in turn, will make their embeddings in the weight matrix similar ▸ These embeddings seem to learn the underlying meanings of words! ‣ King – Man + Woman = Queen ▸ Use these feature representations for classification using a neural network ▸ Gensim library in python produces efficient embeddings ‣ Google has also released embeddings of large corpora known as Glove embeddings (~6 Billion words) IS4242 (Aditya Karanam) 38 What Classification Model Should We Use? ▸ Feedforward NNs : each position in the feature vector has fixed semantics ‣ The word ‘great’ is treated as if it is un-related to the word ‘movie’ ▸ Instead, we need to: ‣ Process each word by exploiting the context (surrounding words) of each token IS4242 (Aditya Karanam) 40 RNN Abstraction ▸ RNNs pass information between cells in the same layer to maintain context ‣ They take input x and the previous hidden state, and produce the next hidden state, and output y ‣ Helps in capturing dependencies in sequential data such as text, time series, etc. 𝜙: Activation function (Tanh); 𝑊, 𝑈: weights IS4242 (Aditya Karanam) 41 RNN: The Problem of Vanishing Gradients ▸ Textual data can be long (~60 words in a sentence) ‣ During backpropagation over long sequences, gradients of the later words may quickly go to zero → challenging to capture long-range dependencies. ‣ E.g.: Tanh: gradient is almost zero if x not in [-2, 2] ▸ Known as the problem of vanishing gradients IS4242 (Aditya Karanam) 42 Solution: Long Short-Term Memory (LSTM) Cells ▸ LSTMs: RNNs with Long Term Memory ‣ Maintain two cell states (memory): long-term and the short-term ▸ Long term state remembers or preserves the gradient ‣ Achieved through the clever arrangement of activation functions ▸ Stack as many LSTM cells as you like (just as neurons) IS4242 (Aditya Karanam) 43 Bidirectional LSTM (BILSTM) ▸ Consider the sentence of 𝑛 words: 𝑤1 , 𝑤2 , 𝑤3, … , 𝑤𝑛 ▸ Forward LSTM: 𝑤𝑖 state is computed using the words 𝑤1:𝑖 (ignores 𝑤𝑖+1:𝑛 ) ▸ Backward LSTM: 𝑤𝑖 state is computed using the words 𝑤𝑖:𝑛 (ignores 𝑤1:𝑖−1) ▸ Stack forward and backward LSTM together to capture context from all the words IS4242 (Aditya Karanam) Pooling (max, avg) or Concatenation 44 Stock Price Prediction ▸ 1. Word embeddings from Google Word vector data trained on 6 Billion words – ▸ 2. Train the BiLSTM model with labeled tweet data ▸ 3. Predict the sentiment for all tweets data: ~80,000 sentences ‣ 3.1 Average sentiment across tweets for a given stock on a given day ▸ 4. Regress it over the stock price with other control variables ‣ Check the co-efficient and significance of the sentiment IS4242 (Aditya Karanam) 45 Stock Price Prediction ▸ Regression results for full data: Sentiment is significant and positive IS4242 (Aditya Karanam) 46 Stock Price Prediction 1+𝐶𝑜𝑢𝑛𝑡 𝑜𝑓 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑤𝑒𝑒𝑡𝑠 ▸ Stock Bullish score: log : Bullish if score >0, else Bearish 1+𝑐𝑜𝑢𝑛𝑡 𝑜𝑓 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑡𝑤𝑒𝑒 eWOM has higher predictive power for bearish stocks Bullish Stocks Bearish Stocks IS4242 (Aditya Karanam) 47 References ▸ Word-of-mouth: ‣ Elvira Ismagilova, Yogesh K. Dwivedi, Emma Slade, Michael D. Williams, (2017), Electronic Word of Mouth (eWOM) in the Marketing Context. ▸ Word Embeddings and BiLSTMs: ‣ http://colah.github.io/posts/2015-08-Understanding-LSTMs/ ‣ https://www.cs.utexas.edu/~gdurrett/courses/fa2019/cs388.shtml IS4242 (Aditya Karanam) 48 Additional: Current State Of The Art in NLP ▸ BiLSTMs were state of the art in some contexts until GPT-3.5 ▸ Series of innovations replaces LSTMs ‣ Initially, LSTMS were combined with attention layers ‣ Then, just the attention layers – transformers: Attention is all you need! ‣ This led to the emergence of Large language models ‣ GPT-3, BERT, Chat GPT (GPT 3.5 and GPT 4), etc. IS4242 (Aditya Karanam) 49 Thank You © Copyright National University of Singapore. All Rights Reserved.

Intelligent Systems & Techniques (IS4242) Lecture Notes - Electronic Word of Mouth

Document Details

Tags

Related

Summary

Full Transcript