Word Sense Disambiguation as Classification Problem

Similar documents
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Linking Task: Identifying authors and book titles in verbose queries

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Applications of memory-based natural language processing

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Natural Language Processing. George Konidaris

Parsing of part-of-speech tagged Assamese Texts

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

CS 598 Natural Language Processing

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

CS 446: Machine Learning

Developing a TT-MCTAG for German with an RCG-based Parser

Probabilistic Latent Semantic Analysis

Rule Learning With Negation: Issues Regarding Effectiveness

Lecture 1: Machine Learning Basics

A Case Study: News Classification Based on Term Frequency

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Rule Learning with Negation: Issues Regarding Effectiveness

A Graph Based Authorship Identification Approach

Using dialogue context to improve parsing performance in dialogue systems

Memory-based grammatical error correction

A Bayesian Learning Approach to Concept-Based Document Classification

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Beyond the Pipeline: Discrete Optimization in NLP

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Control and Boundedness

Prediction of Maximal Projection for Semantic Role Labeling

Switchboard Language Model Improvement with Conversational Data from Gigaword

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

Context Free Grammars. Many slides from Michael Collins

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

A Comparison of Two Text Representations for Sentiment Analysis

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Treebank mining with GrETEL. Liesbeth Augustinus Frank Van Eynde

CS Machine Learning

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Speech Recognition at ICSI: Broadcast News and beyond

Modeling full form lexica for Arabic

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Word Sense Disambiguation

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Cross Language Information Retrieval

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

1. Introduction. 2. The OMBI database editor

On document relevance and lexical cohesion between query terms

Short Text Understanding Through Lexical-Semantic Analysis

Domain Adaptation for Parsing

Procedia - Social and Behavioral Sciences 154 ( 2014 )

An Interactive Intelligent Language Tutor Over The Internet

Ensemble Technique Utilization for Indonesian Dependency Parser

Multi-Lingual Text Leveling

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

AQUA: An Ontology-Driven Question Answering System

Controlled vocabulary

A First-Pass Approach for Evaluating Machine Translation Systems

The stages of event extraction

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

The MEANING Multilingual Central Repository

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

BYLINE [Heng Ji, Computer Science Department, New York University,

THE VERB ARGUMENT BROWSER

The Smart/Empire TIPSTER IR System

An Introduction to the Minimalist Program

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Constraining X-Bar: Theta Theory

Multilingual Sentiment and Subjectivity Analysis

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

WITNESS STATEMENT. Very good. If you would just spell your name for me please?

Proceedings of the 19th COLING, , 2002.

Universiteit Leiden ICT in Business

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Intelligent Agents. Chapter 2. Chapter 2 1

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Bug triage in open source systems: a review

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Analysis of Probabilistic Parsing in NLP

Vocabulary Usage and Intelligibility in Learner Language

Handling Sparsity for Verb Noun MWE Token Classification

Automated Non-Alphanumeric Symbol Resolution in Clinical Texts

A Bootstrapping Model of Frequency and Context Effects in Word Learning

Semi-Supervised Face Detection

Calibration of Confidence Measures in Speech Recognition

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Combining a Chinese Thesaurus with a Chinese Dictionary

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Using Semantic Relations to Refine Coreference Decisions

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

ARNE - A tool for Namend Entity Recognition from Arabic Text

Transcription:

Word Sense Disambiguation as Classification Problem Tanja Gaustad Alfa-Informatica University of Groningen The Netherlands tanja@let.rug.nl www.let.rug.nl/ tanja PUK, South Africa, 2002

Overview Introduction to Word Sense Disambiguation (WSD) WSD as Classification Problem PUK, South Africa, 2002 1

What problem are we talking about? Mijn vader zagen we niet meer. PUK, South Africa, 2002 2

What problem are we talking about? Problem: Many words have several meanings Example: Mijn vader zagen we niet meer. zagen past tense of zien (to see) present tense of zagen (to saw)? PUK, South Africa, 2002 3

Why do we need Word Sense Disambiguation? Importance of WSD: Ambiguous words in given context have to be resolved for numerous NLP applications, e.g.: Machine Translation Information Retrieval Parsing Language Understanding PUK, South Africa, 2002 4

Example Case Alpino: Language understanding system for Dutch; includes Hdrug, wide-coverage HPSG grammar for Dutch Large-scale lexicon Parser Disambiguation component WSD integrated in Alpino: Reduction of ambiguity through Selecting probable reading of given word before parsing Checking lexical semantics of output parses PUK, South Africa, 2002 5

WSD in short Problem: Lexical semantic ambiguity Goal: Recover correct sense in a given context Means: Collocational information (context words) Distributional information (frequency) Further related information (morphology, syntax, topic) World knowledge Approach: Combine statistics, corpus and linguistics PUK, South Africa, 2002 6

Statistics and WSD Prior probability probability of sense : (relative frequency) Conditional probability probability of sense given context word : Joint probability : probability of sense N.B. typically never enough occurrences of to completely specify occurring together with context word PUK, South Africa, 2002 7

Statistics and WSD: Example accident crash (Sense 1): Unfortunate or disastrous incident not caused deliberately; a mishap causing injury or damage; in particular, a crash involving road vehicles. Fears that fog could cause a serious accident on the M40 have united members of the District Council. chance (Sense 2): Something that happens without apparent or deliberate cause; a chance event of set of circumstances. We planned the first two children, but our third was an accident. PUK, South Africa, 2002 8

Statistics and WSD: Prior vs. conditional probability prior prob. cond. prob. contextword c prob. crash 0.82 car 1 happy 0 historical 0. great 0.5 sir 0.5 chance 0.18 car 0 happy 1 historical 0. great 0.5 sir 0.5 PUK, South Africa, 2002 9

WSD as Classification Problem Problem restated: Use statistical information about senses and contextwords to build model which correctly predicts word senses Classify input (ambiguous words) into correct classes (senses). Algorithm: e.g. Naive Bayes Maximum Entropy PUK, South Africa, 2002 10

Classification Algorithm I: Naive Bayes Properties: Uses distributional and contextual information Training: Bayes rule : sense ambiguous word of : context words within context window (e.g. 3) Testing: Bayes decision rule Decide if for PUK, South Africa, 2002 11

Classification Algorithm I: Naive Bayes Input: Corpus Training: For every ambiguous word, build training file containing: prior probability of all senses cond. probability of all senses given possible context words Testing: For ambiguous word compute sense with highest score PUK, South Africa, 2002 12

Classification Algorithm I: Naive Bayes My mother already had a few car accidents. prior prob. cond. prob. contextword c prob. crash 0.82 car 1 chance 0.18 car 0 score for crash: 0.82 + 1 = 1.82 score for chance: 0.18 + 0 = 0.18 PUK, South Africa, 2002 13

Classification Algorithm II: Maximum Entropy Maximum Entropy Principle: In absence of additional information, all events should have equal probability Entropy: Self-information; measures amount of information contained in random variable Constraints: Imposed by training data; features and corresponding weight basically combination of Goal: Search distribution that maximises entropy while satisfying constraints imposed by training data PUK, South Africa, 2002 14

Classification Algorithm II: Maximum Entropy is used to find class is number of times rule weights are chosen to maximise entropy of ; for event, and Properties: General technique for estimating probability distributions from data allows to integrate heterogeneous information sources PUK, South Africa, 2002 15

Classification Algorithm II: Maximum Entropy Training: Select set of features, e.g. lemma, part of speech, syntactic relation(s), topical information Compute weight for each feature (feature + weight = constraint) Compute model with maximal entropy which satisfies set of constraints Testing: Classify test data according to model PUK, South Africa, 2002 16

Classification Algorithm II: Maximum Entropy Feature vector: My mother already had a few car accidents. accidents accident car crash wordform lemma context word class f (accidents, crash) = 0.7 f (accidents, chance) = 0.2 f (accident, crash) = 1.5 f (accident, chance) = 0.9 f (car, crash) = 3.5 f (car, chance) = -2.2 score for crash: 0.7 + 1.5 + 3.5 = 5.7 score for chance: 0.2 + 0.9 + -2.2 = -1.1 PUK, South Africa, 2002 17

Conclusion WSD: Important problem to be solved for successful NLP applications Classification: Complex statistical models allow to restate WSD as a classification problem Future Work: Assess use of different sources of information in statistical classification models PUK, South Africa, 2002 18