Web and Text Analytics Course - Week 6 PDF

Web and Text Analytics 2024-25 Week 6 Evangelos Kalampokis https://kalampokis.github.io http://islab.uom.gr © Information Systems Lab Amazon reviews case - Overview - Several approaches to classification - XGBoost Classifier (with or without hyperparameter search) - Support Vector Classifier (SVC) - Naïve Bayes - Random Forest - Less experimentation with different feature engineering methods - Standard feature extraction with the initial methods (CountVectorizer, TFIDF, n-grams) - Some solutions experimented with other methods (e.g., Tweet Tokenizer, WordNetLemmatizer) © Information Systems Lab 2 Notable examples – Discussion (1.1) © Information Systems Lab 3 Notable examples – Discussion (1.2) © Information Systems Lab 4 Notable examples – Discussion (2) © Information Systems Lab 5 Notable examples – Discussion (3) © Information Systems Lab 6 Notable examples – Discussion (4.1) © Information Systems Lab 7 Notable examples – Discussion (4.2) © Information Systems Lab 8 Notable examples – Discussion (5) © Information Systems Lab 9 Sentiment Analysis with Naïve Bayes https://www.coursera.org/learn/classification-vector-spaces-in-nlp/home/week/2 © Information Systems Lab Probabilities ▪ Probability is fundamental to many applications in NLP. We can use it to help classify whether a tweet is positive or negative. – Probabilities – Conditional Probabilities – Bayes rule ▪ Imagine we have an extensive corpus of tweets that can be categorized as either positive or negative sentiment, but not both. ▪ Within that corpus, the word happy is sometimes being labeled positive and sometimes negative. © Information Systems Lab Bayes’ Rule ▪ In general, Bayes rule states that the probability of X given Y is equal to the probability of Y given X times the ratio of the probability of X over the probability of Y. ▪ We can calculate the probability of X given Y if you already know the probability of Y given X and the ratio of the probabilities of X and Y. © Information Systems Lab Probabilities – Positive Tweet ▪ One way to think about probabilities is by counting how frequently events occur. ▪ Suppose we define event A as a tweet is being labeled positive, then the probability of event A, P(A), is calculated as the ratio between the counts of positive tweets in the corpus divided by the total number of tweets in the corpus © Information Systems Lab Probabilities – “Happy” tweets ▪ We can define Event B in a similar way by counting tweets containing the word happy. © Information Systems Lab Probability of the intersection – “Happy” positive tweets ▪ There is another way of looking at it. ▪ Take a look at the section of the diagram where tweets are labelled positive and also contain the word happy. ▪ In the context of this diagram, the probability that a tweet is labelled positive and also contains the word happy is just the ratio of the area of the intersection divided by the area of the entire corpus. © Information Systems Lab Conditional probabilities ▪ Instead of the entire corpus, we only consider tweets that contain the word happy (i.e., only the tweets inside the blue circle). ▪ In this case, the probability that a tweet is positive, given that it contains the word happy, simply becomes the number of tweets that are positive and also contain the word happy. We divide that by the number that contain the word happy © Information Systems Lab Conditional probabilities ▪ We can make the same case for positive tweets. © Information Systems Lab Conditional probabilities ▪ Τhe probability of a tweet being positive, given that it has the word happy, is equal to the probability of the intersection between the tweets that are positive and the tweets that have the word happy divided by the probability of a tweet given from the corpus having the word happy. © Information Systems Lab Bayes’ Rule – “Happy” tweets © Information Systems Lab Bayes’ Rule ▪ In general, Bayes rule states that the probability of X given Y is equal to the probability of Y given X times the ratio of the probability of X over the probability of Y. ▪ We can calculate the probability of X given Y if you already know the probability of Y given X and the ratio of the probabilities of X and Y. © Information Systems Lab Naïve Bayes for Sentiment Analysis ▪ Naive Bayes is an example of supervised machine learning and shares many similarities with the logistic regression method. ▪ It's called naive because this method makes the assumption that the features you're using for classification are all independent. ▪ It works nicely as a simple method for sentiment analysis. © Information Systems Lab Naïve Bayes for Sentiment Analysis ▪ We have two corpora, i.e., one for the positive tweets and one for the negative tweets. ▪ We first extract the vocabulary or all the different words that appear in our corpus along with their counts © Information Systems Lab Naïve Bayes for Sentiment Analysis (Positive Class) ▪ In order to get the conditional probabilities we divide the frequency of each word in a class by it's corresponding sum of words in the class. © Information Systems Lab Naïve Bayes for Sentiment Analysis (Negative class) ▪ In order to get the conditional probabilities we divide the frequency of each word in a class by it's corresponding sum of words in the class. ▪ For the word I in the negative class, we get 3/12 © Information Systems Lab Table of conditional probabilities ▪ If we sum over all the probabilities for each class, we will get 1 © Information Systems Lab Table of conditional probabilities ▪ Words that are equally probable don't add anything to the sentiment – I, am, learning, NLP ▪ The words that have significant difference between the probabilities carry a lot of weight in determining your tweet sentiments. – happy, sad, not ▪ “because” only appears in the positive corpus. It's conditional probability for the negative class is 0. – When this happens, we can’t compare between the two corpora, which will become a problem for our calculations. To avoid this, we will smooth our probability function. © Information Systems Lab Naïve Bayes inference condition for binary classification ▪ If we want to use the table of probabilities to predict the sentiments of a tweet we use the Naïve Bayes inference condition rule for binary classification. ▪ We take the product across all of the words in our tweets of the probability for each word in the positive class divide it by the probability in the negative class. © Information Systems Lab Predict the sentiment of a tweet ▪ We created a table to store the conditional probabilities of words in our vocabulary and applied the Naive Bayes inference condition rule for binary classification of a tweet. ▪ The value is higher than one, which means that overall, the words in the tweet are more likely to correspond to a positive sentiment © Information Systems Lab Laplacian smoothing ▪ Laplacian smoothing is a technique you can use to avoid the probabilities being zero. ▪ Smoothing the probability function means that we will use a slightly different formula from the original. © Information Systems Lab Probability table with Laplacian Smoothing ▪ Bear in mind that in the example below we have eight unique words in in our vocabulary. © Information Systems Lab Probability table with Laplacian Smoothing ▪ This is the final probability table ▪ The number is shown here have been rounded. © Information Systems Lab Ratio of probabilities ▪ Words can have many shades of emotional meaning, but for the purpose of sentiment classification, they're simplified into three categories, namely neutral, positive, and negative. ▪ These categories can be numerically estimated just by dividing the corresponding conditional probabilities of this table. © Information Systems Lab Naïve Bayes inference condition rule ▪ In the example before we assumed that the Prior is 1 because we had exactly the same number of positive and negative tweets. Prior Likelihood ▪ With the addition of the prior ratio, we now have the full Naïve Bayes' formula for binary classification. ▪ When we are building our own application, remember that this term becomes important for unbalanced data-sets. © Information Systems Lab Log Likelihood ▪ Sentiments probability calculation requires multiplication of many numbers with values between 0 and 1. ▪ Carrying out such multiplications on a computer runs the risk of numerical underflow when the number returned is so small if can't be stored on your device log prior log likelihood © Information Systems Lab Calculating Lambda (λ) ▪ Τhe log of the score we use is called the Lambda © Information Systems Lab Table of lambdas ▪ For the word “I”, you get the logarithm of 0.05 divided by 0.05, or the logarithm of 1, which is equal to 0 ▪ “I” would be classified as neutral at 0 © Information Systems Lab Sentiment of a tweet ▪ We can calculate the log likelihood of the tweet as the sum of the lambdas from each word in the tweet. ▪ Remember how previously we saw that the tweet was positive if the product was bigger than 1, with the log of 1 equal to 0. © Information Systems Lab Inference ▪ The positive values indicate that the tweet is positive, a value less than 0 would indicate that the tweet is negative. © Information Systems Lab Sentiment Analysis © Information Systems Lab Train a Naïve Bayes Model ▪ Step 0: Collect and annotate corpuse ▪ Step 1: Preprocess ▪ Step 2: Word count ▪ Step 3: P(w|class) ▪ Step 4: Get lambda ▪ Step 5: Get the log prior © Information Systems Lab Train a Naïve Bayes Model (Steps 0 and 1) © Information Systems Lab Train a Naïve Bayes Model (Step 2) © Information Systems Lab Train a Naïve Bayes Model (Steps 3 and 4) © Information Systems Lab Train a Naïve Bayes Model (Step 5) © Information Systems Lab Predict using Naïve Bayes ▪ With the estimation of the logprior and the table of lambdas we can predict sentiments on a new tweet. © Information Systems Lab Test Naïve Bayes ▪ Validation set. – This data was set aside during training and is composed of a set of raw tweets, so X_val and their corresponding sentiments Y_val. ▪ We compute the score of each entry in X_val. Then we evaluate whether each score is greater than 0. This produces a vector populated with zeros and ones, indicating if the predicted sentiment is negative or positive respectively. © Information Systems Lab

Web and Text Analytics Course - Week 6 PDF

Document Details

Tags

Related

Summary

Full Transcript