Probabilistic Latent Semantic Analysis

Similar documents
CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Latent Semantic Analysis

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Using Web Searches on Important Words to Create Background Sets for LSI Classification

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Switchboard Language Model Improvement with Conversational Data from Gigaword

A Comparison of Two Text Representations for Sentiment Analysis

A Bayesian Learning Approach to Concept-Based Document Classification

COPING WITH LANGUAGE DATA SPARSITY: SEMANTIC HEAD MAPPING OF COMPOUND WORDS

Assignment 1: Predicting Amazon Review Ratings

Python Machine Learning

Evaluating vector space models with canonical correlation analysis

Comment-based Multi-View Clustering of Web 2.0 Items

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Knowledge-Free Induction of Inflectional Morphologies

Experts Retrieval with Multiword-Enhanced Author Topic Model

(Sub)Gradient Descent

CSL465/603 - Machine Learning

A study of speaker adaptation for DNN-based speech synthesis

A Case Study: News Classification Based on Term Frequency

As a high-quality international conference in the field

A Semantic Imitation Model of Social Tag Choices

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Learning Methods for Fuzzy Systems

Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Calibration of Confidence Measures in Speech Recognition

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

A Statistical Approach to the Semantics of Verb-Particles

Using dialogue context to improve parsing performance in dialogue systems

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Modeling function word errors in DNN-HMM based LVCSR systems

Identifying Topical Authorities in Microblogs

Lecture 1: Machine Learning Basics

Speech Recognition at ICSI: Broadcast News and beyond

Cooperative Game Theoretic Models for Decision-Making in Contexts of Library Cooperation 1

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Truth Inference in Crowdsourcing: Is the Problem Solved?

Organizational Knowledge Distribution: An Experimental Evaluation

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Reducing Features to Improve Bug Prediction

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Deep Neural Network Language Models

Learning Methods in Multilingual Speech Recognition

Rule Learning With Negation: Issues Regarding Effectiveness

Australian Journal of Basic and Applied Sciences

Evidence for Reliability, Validity and Learning Effectiveness

Language Independent Passage Retrieval for Question Answering

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Learning From the Past with Experiment Databases

Lecture 1: Basic Concepts of Machine Learning

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Evolutive Neural Net Fuzzy Filtering: Basic Description

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Semi-Supervised Face Detection

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

A Semantic Similarity Measure Based on Lexico-Syntactic Patterns

CS Machine Learning

Matching Similarity for Keyword-Based Clustering

Bug triage in open source systems: a review

Compositional Semantics

arxiv: v2 [cs.ir] 22 Aug 2016

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes

BENCHMARK TREND COMPARISON REPORT:

Laboratorio di Intelligenza Artificiale e Robotica

Mining Topic-level Opinion Influence in Microblog

Attributed Social Network Embedding

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

AQUA: An Ontology-Driven Question Answering System

Corpus Linguistics (L615)

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

Cross Language Information Retrieval

Term Weighting based on Document Revision History

Modeling function word errors in DNN-HMM based LVCSR systems

Linking Task: Identifying authors and book titles in verbose queries

Measuring Web-Corpus Randomness: A Progress Report

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

The Smart/Empire TIPSTER IR System

Automatic Essay Assessment

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

A survey of multi-view machine learning

Algebra 2- Semester 2 Review

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

Transcription:

Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1

Outline Latent Semantic Analysis o Need o Overview o Drawbacks Probabilistic Latent Semantic Analysis o Solution to drawbacks of LSA o Comparison with LSA and document clustering o Model Construction Evaluation of PLSA 2

Need for Latent Semantic Analysis Applications o Compare documents in the semantic (concept) space o Relations between terms o Compare documents across languages o Given: Bag of words Find: matching documents in the semantic space Problems addressing o Synonymy ex: buy - purchase o Polysemy ex: book (verb) - book (noun) 3

LSA Overview Capturing the meaning among words Addressing polysemy and synonymy Key Idea o Dimensionality reduction of word-document co-occurence matrix o Construction of Latent Semantic space From: Documents Words To: Documents Concepts Words LSA may classify documents together even if they don t have common words! 4

LSA Concept Singular Value Decomposition (SVD) Given N which is the word-document co-occurence matrix, compute: N = UΣVt where: Σ is the diagonal matrix with the singular values of N U, V two orthogonal matrices 5

LSA SVD 6

LSA Concept Dimensionality Reduction Keep the K largest singular values which show the dimensions with the greatest variance between words and documents Discarding the lowest dimensions is supposed to be equivalent to reducing the "noise" Terms and documents are converted to points in a K- Dimensional latent space Results do not introduce well defined probabilities and thus, are difficult to interpret 7

Probabilistic LSA Overview Implemented to address: Automated Document Indexing Same concept to LSA o Dimensionality Reduction o Construction of a latent space BUT.. Sound Statistical foundations o Well defined probabilities o Explicable results 8

Probabilistic LSA Aspect Model Generative model based on the Aspect model o Latent variables z are introduced and relate to documents d. o z << d, as the same z i may be associated with more than one documents o z performs as a bottleneck and results in dimensionality reduction 9

Probabilistic LSA Model Multinomial Mixtures Multinomials Mixing weights Joint probability shows the probability of a word w to be inside a document d Word distributions are combinations of the factors P(w z) and the mixing weights P(z d) 10

Probabilistic LSA Model Conditional Independence assumption o Documents and Words are independent given z Thus, equivalently: 11

Probabilistic LSA Model fitting Expectation Maximization Standard procedure for latent variable models E-step: Compute the posteriors for the latent variables z M-step: Update the parameters 12

Probabilistic LSA Space Sub-simplex dimensionality K-1 << D-1 13

Tempered EM Avoid overfitting training data Introduce a regularization term β 14

Tempered EM - Concept Add a term β < 1 in the E step. Used to dampen probabilities in M step. Accelerate model fitting procedure compared to other methods (ex. annealing) Perform EM iterations and then decrease β until performance on held-out data deteriorates. 15

PLSA vs LSA Great PLSA advantages on the modeling side o Well defined probabilities o Interpretable directions in the Probabilistic Latent Semantic space as multinomial word distributions o Better model selection and complexity control (TEM) Important LSA drawbacks in the same side o Not defined properly normalized probabilities o No obvious interpretations of LS space directions o Selection of dimensions based on ad-hoc heuristics Potential computational advantage of LSA over PLSA (SVD vs EM which is an iterative method) 16

Aspect Model vs Clusters Document Clustering Aspect Model Documents Cluster aspect PLSA: Documents are not related to a single cluster flexibility, effective modeling 17

Evaluation perplexity Perplexity: Measures how well a prob. distribution can make predictions. Low perplexity more certain predictions, better model PLSA evaluation method: Extract probabilities from LSA Unigram model as baseline PLSA evaluation results PLSA better than LSA TEM better than EM PLSA allows Z > rank(n) (N is the co-oc. Matrix) 18

Evaluation Automatic Indexing Given a short document (query q) find the most relevant documents Baseline term matching s(d,q): cosine scoring method combined with term frequencies LSA: Linear combination of s(d,q) and the one derived from the latent space PLSA: Evaluation of similarities of P(z d) & P(z q) 19

Evaluation Precision & Recall Precision & Recall: Popular measures in Information Retrieval. 20

Evaluation Precision & Recall For intermediate values of recall, the precision of PLSA is almost 100% better than the baseline method!!! 21

Evaluation Polysemy Results show advantage of PLSA over polysemy 22

Conclusion Documents are represented as vectors of word frequencies There is no syntactic relation or word ordering but co occurences still provide useful semantic insights about the document topics PLSA is a generative model based on this idea. It can be used to extract topics from a collection of documents PLSA significantly outperforms LSA thanks to its probabilistic basis. 23

References D.M. Blei, A.Y. Ng, and M.I. Jordan, Latent dirichlet allocation, J. Mach. Learn. Res., vol. 3, 2003, pp. 993-1022. T. Hofmann, Unsupervised Learning by Probabilistic Latent Semantic Analysis, Machine Learning, vol. 42, Jan. 2001, pp. 177-196. T. Hofmann, Probabilistic latent semantic analysis, In Proc. of Uncertainty in Artificial Intelligence, UAI 99, 1999, pp. 289--296. DEERWESTER, S., DUMAIS, S., LANDAUER, T., FURNAS, G., AND HARSHMAN, R. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Info. Sci. 41, 391-407. 24