16-Non-Negative-Matrix-Factorization.pdf

Full Transcript

Non-negative Matrix Factorization (NMF, NNMF) [LeSe99] Given a data matrix the parameter with find two matrixes , and , such that Variant 1: Squared error – minimize Variant 2: Divergence – minimize the log-likelihood of 10 NMF: Multiplicative Update for Squared Error Gradient descent method for squ...

Non-negative Matrix Factorization (NMF, NNMF) [LeSe99] Given a data matrix the parameter with find two matrixes , and , such that Variant 1: Squared error – minimize Variant 2: Divergence – minimize the log-likelihood of 10 NMF: Multiplicative Update for Squared Error Gradient descent method for squared error [LeSe00; LeSe99]: 1. initialize with random non-negative values (later: with better heuristics) 2. maximize : 3. maximize : 4. repeat 2-3 until the improvement of is below a threshold May get stuck in a local fixpoint or saddlepoint – does not guarantee to find the optimum. Note: we need , , so it is common to add a small constant to all values. 11 NMF: Multiplicative Update for KL-Divergence Gradient descent method for KL-Divergence [LeSe00; LeSe99]: 1. initialize randomly (or with better heuristics) 2. maximize : 3. maximize : 4. repeat 2-3 until the improvement of is below a threshold May get stuck in a local fixpoint or saddlepoint – does not guarantee to find the optimum. Note: we need , , so it is common to add a small constant to all values. Other Algorithms for NMF When these algorithms are implemented literally, they are slow: dense matrix multiplications (in ) per iteration Many algorithm variants have been developed: ▶ Alternating Least Squares [PaTa94] ▶ Projected-gradient ALS [Lin07] ▶ Quasi-Newton optimization [ZdCi06] ▶ Hierarchical Alternating Least Squares [CiPh09] ▶ Initialize with spherical- -means ▶ Initialize with SVD [BBLP07] ▶ Initialization survey: [LMAC14] ▶ Beta divergence [FéId11] ▶ Non-negative tensor factorization [CiPh09] ▶… 13 pLSI [Hofm99] is Non-Negative Matrix Factorization [GaGo05] Probabilistic Latent Semantic Indexing (c.f. Part 4) is a probabilistic version of LSI/LSA. PLSI used a graphical mixture model with (mixture model notation; different than in Part 4) ▶ : word probability for each topic ▶ : topic probability for each document ▶ probabilities are non-negative “Any (local) maximum likelihood solution of PLSA is a solution of NMF with KL divergence.” [GaGo05] 14 Problems of NMF Interesting theoretical analysis: David L. Donoho, and Victoria Stodden. When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? In: NIPS, 8, 2003 Observations: ▶ decomposition is not orthogonal (in contrast to SVD, PCA) ▶ decomposition is not unique (not even when is achieved; not only permutations and scaling of components) ▶ in particular, invariant regions are decomposed arbitrarily (stop words invariants of the data) ▶ 0 values are problematic with multiplicative update strategy Need additional constraints, such as sparsity regularization [EgKo04; Hoye04]. 16

Use Quizgecko on...
Browser
Browser