12-pLSI.pdf
Document Details
Uploaded by ThrillingTuba
Tags
Full Transcript
Basics of Graphical Models /1 If we have variables and , such that each depends on , we can draw this as: ≡ The joint probability then is Plates serve as a shorthand notion for repeated elements. 14 Basics of Graphical Models /2 Notation elements: prior parameter with predefined constant values hidd...
Basics of Graphical Models /1 If we have variables and , such that each depends on , we can draw this as: ≡ The joint probability then is Plates serve as a shorthand notion for repeated elements. 14 Basics of Graphical Models /2 Notation elements: prior parameter with predefined constant values hidden variable variable whose values need to be estimated observed variable variable whose values are given by the data dependency indicates variables that depend on each other repeated elements shorthand to simplify repeated structures 15 Probabilistic Latent Semantic Indexing (pLSI) [Hofm99a; Hofm99b] Where is the th word in document , is the words topic, is the topic distribution of document , and is the word distribution of topic. We then have the probability of observing the entire corpus : This assumes conditional independence given an unobserved topic : [BlNgJo03] We can use the EM algorithm to find the most likely parameters. 17 Expectation-Maximization EM (Expectation-Maximization) is an iterative procedure to optimize a statistical model: 1. 2. 3. 4. Choose initial model parameters Expect latent variables (e.g., topic assignments) from and the data. Update to maximize the likelihood of observing the data Repeat (2.)–(3.) until a stopping condition holds Algorithm: Expectation-Maximization The generic Expectation-Maximization algorithm is: The log likelihood is improved by both the E-step, and the M-step. 19 Probabilistic Generative Model The model assumes our data set was generated by a process like this: 1. For every topic , sample a word distribution 2. For every document , sample a topic distribution 3. For every document , generate words 1. Sample a topic 2. Sample a word : from from the distribution Note: this is not how we really generate documents, but our underlying assumption. 20 Likelihood Function of pLSI In PLSI, we model a document as mixtures of topics: 21 EM-Algorithm for pLSI Expectation-Step (estimate Maximization-Step (optimize , from previous , ): from ): Note: because of , which depends on the document , the topic estimation of a word can be different in different documents! This is the Maximum-Likelihood Estimation (MLE). 22 Incorporating Prior Knowledge PLSI will model all words, including stopwords! This causes some problems, we therefore can: ▶ Remove stopwords ▶ Add a “background” topic (c.f., [ZhMa16]) ▶ Use maximum-a-posterior (MAP) estimation, with a prior word distribution Where controls the strength of prior information, and is the prior word distribution. [ZhMa16] 23