Naive Bayes Classifier Approach to Word Sense Disambiguation

Similar documents
Word Sense Disambiguation

A Case Study: News Classification Based on Term Frequency

A Bayesian Learning Approach to Concept-Based Document Classification

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Linking Task: Identifying authors and book titles in verbose queries

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Prediction of Maximal Projection for Semantic Role Labeling

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Disambiguation of Thai Personal Name from Online News Articles

CS 446: Machine Learning

Artificial Neural Networks written examination

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Lecture 1: Machine Learning Basics

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Beyond the Pipeline: Discrete Optimization in NLP

On document relevance and lexical cohesion between query terms

Rule Learning With Negation: Issues Regarding Effectiveness

Leveraging Sentiment to Compute Word Similarity

Multivariate k-nearest Neighbor Regression for Time Series data -

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

2.1 The Theory of Semantic Fields

A Graph Based Authorship Identification Approach

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

LTAG-spinal and the Treebank

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Focus of the Unit: Much of this unit focuses on extending previous skills of multiplication and division to multi-digit whole numbers.

Search right and thou shalt find... Using Web Queries for Learner Error Detection

Automated Non-Alphanumeric Symbol Resolution in Clinical Texts

Modeling function word errors in DNN-HMM based LVCSR systems

Distant Supervised Relation Extraction with Wikipedia and Freebase

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

A Comparison of Two Text Representations for Sentiment Analysis

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

Unsupervised Learning of Narrative Schemas and their Participants

The stages of event extraction

Context Free Grammars. Many slides from Michael Collins

Multilingual Sentiment and Subjectivity Analysis

Switchboard Language Model Improvement with Conversational Data from Gigaword

Python Machine Learning

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Short Text Understanding Through Lexical-Semantic Analysis

Rule Learning with Negation: Issues Regarding Effectiveness

Extending Place Value with Whole Numbers to 1,000,000

Grammars & Parsing, Part 1:

Reducing Features to Improve Bug Prediction

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

1. Introduction. 2. The OMBI database editor

Using Games with a Purpose and Bootstrapping to Create Domain-Specific Sentiment Lexicons

Natural Language Processing: Interpretation, Reasoning and Machine Learning

I N T E R P R E T H O G A N D E V E L O P HOGAN BUSINESS REASONING INVENTORY. Report for: Martina Mustermann ID: HC Date: May 02, 2017

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Radius STEM Readiness TM

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Lecture 10: Reinforcement Learning

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Talk About It. More Ideas. Formative Assessment. Have students try the following problem.

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

The Strong Minimalist Thesis and Bounded Optimality

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Training and evaluation of POS taggers on the French MULTITAG corpus

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

An Introduction to the Minimalist Program

The Role of the Head in the Interpretation of English Deverbal Compounds

SARDNET: A Self-Organizing Feature Map for Sequences

Modeling function word errors in DNN-HMM based LVCSR systems

(Sub)Gradient Descent

Applications of memory-based natural language processing

Large vocabulary off-line handwriting recognition: A survey

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Vocabulary Usage and Intelligibility in Learner Language

A Neural Network GUI Tested on Text-To-Phoneme Mapping

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

Robust Sense-Based Sentiment Classification

CS Machine Learning

Indian Institute of Technology, Kanpur

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

A survey of multi-view machine learning

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Rule-based Expert Systems

The following information has been adapted from A guide to using AntConc.

Exploration. CS : Deep Reinforcement Learning Sergey Levine

THE VERB ARGUMENT BROWSER

Ensemble Technique Utilization for Indonesian Dependency Parser

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Annotation Projection for Discourse Connectives

Transcription:

Naive Bayes Classifier Approach to Word Sense Disambiguation Daniel Jurafsky and James H. Martin Chapter 20 Computational Lexical Semantics Sections 1 to 2 Seminar in Methodology and Statistics 3/June/2009 Daniel Jurafsky and James H. Martin Naive Bayes Classifier Approach to Word Sense Disambiguation 1/18

Outline 1 Word Sense Disambiguation WSD What is WSD? Variants of WSD 2 Naive Bayes Classifier Statistics difficulty Get around the problem Assumption Substitution Intuition of Naive Bayes Classifier for WSD 3 Conclusion Daniel Jurafsky and James H. Martin Naive Bayes Classifier Approach to Word Sense Disambiguation 2/18

What is WSD? Variants of WSD What is WSD? WSD is the task of automatically assigning appropriate meaning to a polysemous word within a given contex Polysemy is: the ambiguity of an individual word or phrase that can be used (in different contexts) to express two or more different meanings Here WSD is discussed in relation to computational lexical semantics Daniel Jurafsky and James H. Martin Naive Bayes Classifier Approach to Word Sense Disambiguation 3/18

What is WSD? Variants of WSD Example of polysemous word Figure: Example sentences of the polysemous word bar Daniel Jurafsky and James H. Martin Naive Bayes Classifier Approach to Word Sense Disambiguation 4/18

What is WSD? Variants of WSD Variants of generic WSD Many WSD algorithms rely on contextual similarity to help choose the proper sense of a word in context Two variants of WSD include: All words approach and Suppervised or lexical sample approach Daniel Jurafsky and James H. Martin Naive Bayes Classifier Approach to Word Sense Disambiguation 5/18

What is WSD? Variants of WSD Unsupervised WSD approach All words WSD approach A system is given entire texts and a lexicon with an inventory of senses for each entry and the system is required to disambiguate every context word in the text, disadvantages: 1 Training data for each word in the test set may not be available 2 The approach of training one classifier per term is not practical Daniel Jurafsky and James H. Martin Naive Bayes Classifier Approach to Word Sense Disambiguation 6/18

What is WSD? Variants of WSD Supervised WSD approach Supervised WSD approach or lexical sample WSD approach Takes as input a word in context along with a fixed inventory of potential word senses and outputs the correct word sense for that use The input data is hand-labled with correct word senses Unlabeled target words in context can then be labeled using such a trained classifier Daniel Jurafsky and James H. Martin Naive Bayes Classifier Approach to Word Sense Disambiguation 7/18

What is WSD? Variants of WSD Collecting features for Supervised WSD Input for Supervised WSD are collected in feature vectors A feature vector consits of numeric or nominal values to encode linguistic information as input to most ML algorithms Two classes of feature vectors extracted from neighbouring context are: 1 Bag-of-word feature vectors and 2 Collocational feature vectors Daniel Jurafsky and James H. Martin Naive Bayes Classifier Approach to Word Sense Disambiguation 8/18

What is WSD? Variants of WSD Classes of feature vectors Bag-of-word feature vectors These are unordered set of words with their exact position ignored Daniel Jurafsky and James H. Martin Naive Bayes Classifier Approach to Word Sense Disambiguation 9/18

What is WSD? Variants of WSD Classes of feature vectors Collocation feature vectors A collocation is a word or phrase in a position of specific relationship to a target word Thus a collocation encodes information about specific positions located to the left or right of the target word e.g. take bass as target An electric guitar and bass player stand off to one side,... Collocation feature vector, extracted from a window of two words to the right and left of the target word, made up of the words themselves and their respective POS, that is: [w i 2, POS i 2, w i 1, POS i 1, w i+1, POS i+1, w i+2, POS i+2 ] Would yield the following vector: [guitar, NN, and, CC, player, NN, stand, VB] Daniel Jurafsky and James H. Martin Naive Bayes Classifier Approach to Word Sense Disambiguation 10/18

Statistics difficulty Get around the problem Assumption Substitution Intuition of Naive Bayes Classifier for WSD Naive Bayes Classifier Because of the feature vector annotations we can use a Naive Bayes Classifier approach to WSD This approach is based on the premise that: Choosing the best sense ŝ out of the set of possible senses S for a feature vector f amounts to choosing the most probable sense given that vector. This is to say: ŝ = arg max P(s f ) (1) s S Daniel Jurafsky and James H. Martin Naive Bayes Classifier Approach to Word Sense Disambiguation 11/18

Statistics difficulty Get around the problem Assumption Substitution Intuition of Naive Bayes Classifier for WSD Statistics difficulty Collecting reasonable statistics for above equation is difficult. For example: Consider that a binary bag of words vector defined over a vocabulary of 20 words would have possible feature vectors. 2 20 = 1, 048, 576 (2) Daniel Jurafsky and James H. Martin Naive Bayes Classifier Approach to Word Sense Disambiguation 12/18

Statistics difficulty Get around the problem Assumption Substitution Intuition of Naive Bayes Classifier for WSD To get around the problem Equation 1 is Reformulated into the usual Bayesian manner: ŝ = arg max s S P( f s)p(s) P( f ) (3) Data that associates specific f with each sense is sparse But information about individual feature-value pairs in the context of specific senses is available in a tagged training set Daniel Jurafsky and James H. Martin Naive Bayes Classifier Approach to Word Sense Disambiguation 13/18

Statistics difficulty Get around the problem Assumption Substitution Intuition of Naive Bayes Classifier for WSD Assumption We naively assume that features are independed of one another and that features are conditionally independent given the word sense Yielding the following approximation for P( f s): P( f s) n P(f j s) (4) Probability of an entire vector given a sense can be estimated by the product of the probability of its individual features given that sense j=1 Daniel Jurafsky and James H. Martin Naive Bayes Classifier Approach to Word Sense Disambiguation 14/18

Statistics difficulty Get around the problem Assumption Substitution Intuition of Naive Bayes Classifier for WSD Naive Bayes Classifier for WSD Since P( f ) is the same for all possible senses it does not affect the final ranking of senses Leaving us with the following formulation when we subtitute for P( f s) in equation 3 above n ŝ = arg max P(s) P(f j s) (5) s S j=1 Daniel Jurafsky and James H. Martin Naive Bayes Classifier Approach to Word Sense Disambiguation 15/18

Statistics difficulty Get around the problem Assumption Substitution Intuition of Naive Bayes Classifier for WSD Training a Naive Bayes Classifier We can estimate each of the probabilities in equation 5 as shown below: Prior probability of each sense P(s) This probability is the sum of the instances of each sense of the word, i.e.: P(s i ) = count(s i, w j ) count(w j ) (6) Individual feature probabilities P(f j s) P(f j s) = count(f j, s) count(s) (7) Daniel Jurafsky and James H. Martin Naive Bayes Classifier Approach to Word Sense Disambiguation 16/18

Statistics difficulty Get around the problem Assumption Substitution Intuition of Naive Bayes Classifier for WSD Intuition of Naive Bayes Classifier for WSD Take a target word in context Extract the specified features e.g. neighbouring words, POS, position n Compute P(s) P(f j s) for each sense j=1 Return the sense associated with the highest scores. Daniel Jurafsky and James H. Martin Naive Bayes Classifier Approach to Word Sense Disambiguation 17/18

Conclusion We discussed the Naive Baye s classifier for WSD based on Baye s theorem and shown that it is possible to disambiguate word Senses in context But we have not discussed: Evaluation of such systems, and Disambiguation of phrases To find out, come to my TabuDag presentation Daniel Jurafsky and James H. Martin Naive Bayes Classifier Approach to Word Sense Disambiguation 18/18