Word Sense Disambiguation

Similar documents
Word Sense Disambiguation

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Applications of memory-based natural language processing

A Bayesian Learning Approach to Concept-Based Document Classification

Construction Grammar. University of Jena.

Linking Task: Identifying authors and book titles in verbose queries

Multilingual Sentiment and Subjectivity Analysis

Probabilistic Latent Semantic Analysis

On document relevance and lexical cohesion between query terms

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Context Free Grammars. Many slides from Michael Collins

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Cross Language Information Retrieval

Leveraging Sentiment to Compute Word Similarity

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

The MEANING Multilingual Central Repository

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Lecture 1: Machine Learning Basics

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Combining a Chinese Thesaurus with a Chinese Dictionary

2.1 The Theory of Semantic Fields

CS Machine Learning

Lecture 1: Basic Concepts of Machine Learning

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

The stages of event extraction

Genevieve L. Hartman, Ph.D.

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Compositional Semantics

Ensemble Technique Utilization for Indonesian Dependency Parser

A Case Study: News Classification Based on Term Frequency

Distant Supervised Relation Extraction with Wikipedia and Freebase

The taming of the data:

Word Segmentation of Off-line Handwritten Documents

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Vocabulary Usage and Intelligibility in Learner Language

Derivational and Inflectional Morphemes in Pak-Pak Language

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

A Comparison of Two Text Representations for Sentiment Analysis

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Using dialogue context to improve parsing performance in dialogue systems

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

A Graph Based Authorship Identification Approach

Natural Language Processing. George Konidaris

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Short Text Understanding Through Lexical-Semantic Analysis

AQUA: An Ontology-Driven Question Answering System

Prediction of Maximal Projection for Semantic Role Labeling

Online Updating of Word Representations for Part-of-Speech Tagging

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Modeling function word errors in DNN-HMM based LVCSR systems

A process by any other name

1. Introduction. 2. The OMBI database editor

Rule Learning With Negation: Issues Regarding Effectiveness

(Sub)Gradient Descent

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

Learning Methods in Multilingual Speech Recognition

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

THE VERB ARGUMENT BROWSER

Controlled vocabulary

Switchboard Language Model Improvement with Conversational Data from Gigaword

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Using Web Searches on Important Words to Create Background Sets for LSI Classification

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

First Grade Standards

INPE São José dos Campos

Cross-Lingual Text Categorization

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Modeling function word errors in DNN-HMM based LVCSR systems

First Grade Curriculum Highlights: In alignment with the Common Core Standards

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Indian Institute of Technology, Kanpur

Concepts and Properties in Word Spaces

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Part III: Semantics. Notes on Natural Language Processing. Chia-Ping Chen

Grammars & Parsing, Part 1:

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

Transcription:

Word Sense Disambiguation Computational Lexical Semantics Gemma Boleda 1 Stefan Evert 2 1 Universitat Politècnica de Catalunya 2 University of Osnabrück ESSLLI. Bordeaux, France, July 2009. 1 / 56

Thanks Overview These slides are based on Jurafsky & Martin (2004: chapter 20) and material by Ann Copestake (course at UPF, 2008) 2 / 56

Outline Overview 1 Overview 2 3 4 5 3 / 56

Outline Overview 1 Overview 2 3 4 5 4 / 56

Overview Word Sense Disambiguation The task of selecting the correct sense for a word in context. potentially helpful in many applications machine translation, question answering, information retrieval... we focus on WSD as a stand-alone task artificial! 5 / 56

Overview Word Sense Disambiguation The task of selecting the correct sense for a word in context. potentially helpful in many applications machine translation, question answering, information retrieval... we focus on WSD as a stand-alone task artificial! 5 / 56

Overview Word Sense Disambiguation The task of selecting the correct sense for a word in context. potentially helpful in many applications machine translation, question answering, information retrieval... we focus on WSD as a stand-alone task artificial! 5 / 56

Overview Word Sense Disambiguation The task of selecting the correct sense for a word in context. potentially helpful in many applications machine translation, question answering, information retrieval... we focus on WSD as a stand-alone task artificial! 5 / 56

WSD algorithm basic form: input: word in context, fixed inventory of word senses output: the correct word sense for that use context? words surrounding the target word: annotated? just the words in no particular order? context size? inventory? task-dependent machine translation from English to Spanish: set of Spanish translations speech synthesis: homographs with differing pronunciations (e.g., bass) stand-alone task: a lexical resource (usually, WordNet) 6 / 56

WSD algorithm basic form: input: word in context, fixed inventory of word senses output: the correct word sense for that use context? words surrounding the target word: annotated? just the words in no particular order? context size? inventory? task-dependent machine translation from English to Spanish: set of Spanish translations speech synthesis: homographs with differing pronunciations (e.g., bass) stand-alone task: a lexical resource (usually, WordNet) 6 / 56

WSD algorithm basic form: input: word in context, fixed inventory of word senses output: the correct word sense for that use context? words surrounding the target word: annotated? just the words in no particular order? context size? inventory? task-dependent machine translation from English to Spanish: set of Spanish translations speech synthesis: homographs with differing pronunciations (e.g., bass) stand-alone task: a lexical resource (usually, WordNet) 6 / 56

WSD algorithm basic form: input: word in context, fixed inventory of word senses output: the correct word sense for that use context? words surrounding the target word: annotated? just the words in no particular order? context size? inventory? task-dependent machine translation from English to Spanish: set of Spanish translations speech synthesis: homographs with differing pronunciations (e.g., bass) stand-alone task: a lexical resource (usually, WordNet) 6 / 56

An example WordNet Sense Target Word in Context bass 4... fish as Pacific salmon and striped bass and... bass 4... produce filets of smoked bass or sturgeon... bass 7... exciting jazz bass player since Ray Brown... bass 7... play bass because he doesn t have to solo... Figure: Possible inventory of sense tags for word bass 7 / 56

Variants of the task lexical sample task WSD for a small set of target words a number of corpus instances are selected and labeled similar to task in our case study supervised approaches; word-specific classifiers all-words WSD for all content words in a text similar to POS-tagging; but very large tagset! data sparseness not enough training data for every word 8 / 56

Outline Overview 1 Overview 2 3 4 5 9 / 56

Feature extraction supervised approach need to identify features that are predictive of word senses fundamental (and early) insight: look at the context words bass smoked bass or jazz bass player window (e.g., 1-word window) 10 / 56

Feature extraction supervised approach need to identify features that are predictive of word senses fundamental (and early) insight: look at the context words bass smoked bass or jazz bass player window (e.g., 1-word window) 10 / 56

Feature extraction supervised approach need to identify features that are predictive of word senses fundamental (and early) insight: look at the context words bass smoked bass or jazz bass player window (e.g., 1-word window) 10 / 56

Feature extraction supervised approach need to identify features that are predictive of word senses fundamental (and early) insight: look at the context words bass smoked bass or jazz bass player window (e.g., 1-word window) 10 / 56

Method Overview process the dataset (POS-tagging, lemmatization, parsing) build feature representation encoding the relevant linguistic information two main feature types: 1 collocational features 2 bag-of-words features 11 / 56

Collocational features features that take order or syntactic relations into account restricted to immediate word context (usually fixed window). For example: lemma and part of speech of two-word window syntactic function of the target word 12 / 56

Collocational features: Example Example: (20.1) An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps. 2-word window representation, using parts of speech: [guitar, NN, and, CC, player, NN, stand, VB] [w 2, P 2, w 1, P 1, w + 1, P + 1, w + 2, P + 2] 13 / 56

Collocational features: Example Example: (20.1) An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps. 2-word window representation, using parts of speech: [guitar, NN, and, CC, player, NN, stand, VB] [w 2, P 2, w 1, P 1, w + 1, P + 1, w + 2, P + 2] 13 / 56

Collocational features: Example Example: (20.1) An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps. 2-word window representation, using parts of speech: [guitar, NN, and, CC, player, NN, stand, VB] [w 2, P 2, w 1, P 1, w + 1, P + 1, w + 2, P + 2] 13 / 56

Bag-of-words features lexical features pre-selected words that are potentially relevant for sense distinctions. For example: for all-words task: frequent content words in the corpus for lexical sample task: content words in the sentences of the target word test for presence/absence of a certain word in the selected context 14 / 56

Bag-of-words features: Example Example: (20.1) An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps. pre-selected words: [fishing, big, sound, player, fly] feature vector: [0, 0, 0, 1, 0] 15 / 56

Bag-of-words features: Example Example: (20.1) An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps. pre-selected words: [fishing, big, sound, player, fly] feature vector: [0, 0, 0, 1, 0] 15 / 56

More on features collocational cues account for: collocational effects bass+player=bass 7 syntax-related sense differences serve breakfast to customers vs. serve Philadelphia bag of word features account for topic and domain related features resemblance to semantic fields, frames,... complementary information both feature types usually combined 16 / 56

Combined representation: Example simplified representation for 2 sentences: collocational features corresponding to 1-word window:... jazz bass player...... smoked bass or... bag-of-word features only fishing, player 17 / 56

Combined representation Weka format @relation bass @attribute wordl1 {jazz,smoke} @attribute posl1 {CC,VBD} @attribute wordr1 {player,or} @attribute posr1 {CC,NN} @attribute fishing {0,1} @attribute player {0,1} @attribute sense {s4,s7} @data jazz,cc,player,nn,0,1,s7 smoke,vbd,or,nn,0,0,s4... jazz bass player...... smoked bass or... 18 / 56

Method Overview any supervised algorithm Decision Trees (for example, J48) Decision Lists (similar to Decision Trees) Naive Bayes (probabilistic)... and tool Weka R SVMTool your own implementation... 19 / 56

Interim Summary supervised approaches use sense-annotated datasets need many annotated examples for every word relevant information in the context: lexico-syntactic information (collocational features) lexical information (bag of words features) information is encoded in the form of features... and a classifier is trained to distinguish different senses of a given word 20 / 56

Outline Overview 1 Overview 2 3 4 5 21 / 56

Extrinsic evaluation long term goal: improve performance in end-to-end application extrinsic evaluation (or task-based, end-to-end, in vivo evaluation) example: Word Sense Disambiguation for (Cross-Lingual) Information Retrieval http://ixa2.si.ehu.es/clirwsd 22 / 56

Intrinsic evaluation however, extrinsic evaluation difficult and time consuming intrinsic evaluation (or in vitro evaluation) treat a WSD component as if it were a stand-alone system measure: sense accuracy (percentage of words correctly tagged) Accuracy = matches total method: held-out data from the same sense-tagged corpora used for training (train-test methodology) to standardize datasets and methods: SensEval and SemEval competitions example: our case study 23 / 56

Intrinsic evaluation however, extrinsic evaluation difficult and time consuming intrinsic evaluation (or in vitro evaluation) treat a WSD component as if it were a stand-alone system measure: sense accuracy (percentage of words correctly tagged) Accuracy = matches total method: held-out data from the same sense-tagged corpora used for training (train-test methodology) to standardize datasets and methods: SensEval and SemEval competitions example: our case study 23 / 56

Baseline Overview baseline: performance we would get without much knowledge / with a simple approach necessary for any Machine Learning experiment (how good is 70%?) simplest baseline: most frequent sense WordNet: first sense heuristic (senses ordered) very powerful baseline! skewed distribution of senses in corpora BUT we need access to annotated data for every word in the dataset to estimate sense frequencies this is a knowledge-laden baseline 24 / 56

Ceiling Overview ceiling or upper-bound for performance: inter-coder agreement all-word corpora using WordNet: A o 0.75 0.8 more coarse-grained sense distinctions: A o 0.9 another possibility: avoid annotation using pseudowords banana-door however: unrealistic real polysemy is not like banana-doors! need to find better ways to create pseudowords 25 / 56

Outline Overview 1 Overview 2 3 4 5 26 / 56

Overview sense-labeled corpora give accurate information but scarce! need other sources: dictionaries, thesaurus, selectional restrictions... idea: use dictionaries as corpora (identifying related words in definitions and examples) 27 / 56

An example Example: (20.10) The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable-rate mortgage securities. bank 1 Gloss: a financial institution that accepts deposits and channels the money into lending activities Examples: he cashed a check at the bank ; that bank holds the mortgage on my home bank 2 Gloss: sloping land (especially beside a body of water) Examples: they pulled the canoe up on the bank ; he sat on the bank of the river Figure: WordNet information for two senses of bank 28 / 56

An example Example: (20.10) The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable-rate mortgage securities. bank 1 Gloss: a financial institution that accepts deposits and channels the money into lending activities Examples: he cashed a check at the bank ; that bank holds the mortgage on my home bank 2 Gloss: sloping land (especially beside a body of water) Examples: they pulled the canoe up on the bank ; he sat on the bank of the river Figure: WordNet information for two senses of bank 28 / 56

An example Example: (20.10) The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable-rate mortgage securities. bank 1 Gloss: a financial institution that accepts deposits and channels the money into lending activities Examples: he cashed a check at the bank ; that bank holds the mortgage on my home bank 2 Gloss: sloping land (especially beside a body of water) Examples: they pulled the canoe up on the bank ; he sat on the bank of the river Figure: WordNet information for two senses of bank 29 / 56

Signatures Overview signature: set of words that characterizes a given sense of a target word extracted from dictionaries, thesauri, tagged corpora,... for example (20.10): bank 1 : financial, institution, accept, deposit, channel, money, lending, activity, cash, check, hold, mortgage, home bank 2 : sloping, land, body, water, pull, canoe, bank, sit, river 30 / 56

Lesk Algorithm Lesk Algorithm function SIMPLIFIED LESK(word, sentence) returns best sense of word best-sense most frequent sense for word max-overlap 0 context set of words in sentence for each sense in senses of word do signature set of words in the gloss and examples of sense overlap COMPUTEOVERLAP(signature, context) if overlap > max-overlap then max-overlap overlap best-sense sense end return(best-sense) 31 / 56

Lesk Algorithm Example: she strolled by the river bank. best-sense bank 1 ; max-overlap 0 context {she, stroll, river} sense bank 1 : signature {financial, institution, accept, deposit, channel, money, lending, activity, cash, check, hold, mortgage, home} overlap 0; 0 > 0 fails sense bank 2 : signature {sloping, land, body, water, pull, canoe, bank, sit, river} overlap 1; 1 > 0 succeeds best-sense bank 2 ; max overlap 1 return bank 2 32 / 56

Lesk Algorithm Example: she strolled by the river bank. best-sense bank 1 ; max-overlap 0 context {she, stroll, river} sense bank 1 : signature {financial, institution, accept, deposit, channel, money, lending, activity, cash, check, hold, mortgage, home} overlap 0; 0 > 0 fails sense bank 2 : signature {sloping, land, body, water, pull, canoe, bank, sit, river} overlap 1; 1 > 0 succeeds best-sense bank 2 ; max overlap 1 return bank 2 32 / 56

Lesk Algorithm Example: she strolled by the river bank. best-sense bank 1 ; max-overlap 0 context {she, stroll, river} sense bank 1 : signature {financial, institution, accept, deposit, channel, money, lending, activity, cash, check, hold, mortgage, home} overlap 0; 0 > 0 fails sense bank 2 : signature {sloping, land, body, water, pull, canoe, bank, sit, river} overlap 1; 1 > 0 succeeds best-sense bank 2 ; max overlap 1 return bank 2 32 / 56

Lesk Algorithm Example: she strolled by the river bank. best-sense bank 1 ; max-overlap 0 context {she, stroll, river} sense bank 1 : signature {financial, institution, accept, deposit, channel, money, lending, activity, cash, check, hold, mortgage, home} overlap 0; 0 > 0 fails sense bank 2 : signature {sloping, land, body, water, pull, canoe, bank, sit, river} overlap 1; 1 > 0 succeeds best-sense bank 2 ; max overlap 1 return bank 2 32 / 56

Lesk Algorithm Example: she strolled by the river bank. best-sense bank 1 ; max-overlap 0 context {she, stroll, river} sense bank 1 : signature {financial, institution, accept, deposit, channel, money, lending, activity, cash, check, hold, mortgage, home} overlap 0; 0 > 0 fails sense bank 2 : signature {sloping, land, body, water, pull, canoe, bank, sit, river} overlap 1; 1 > 0 succeeds best-sense bank 2 ; max overlap 1 return bank 2 32 / 56

Overview right intuition: words that appear in dictionary definitions and examples are relevant to a given sense problem: data sparseness: dictionary entries short, not always examples Lesk algorithm currently used as baseline BUT many extensions possible and have been tried (generalizations over lemmata, corpus data, weighting,... ) AND dictionary-derived features can be used (are used) in standard supervised approaches 33 / 56

Interim Summary information encoded in dictionaries (definitions, examples) is useful for WSD can be used exclusively or in addition to other information (collocations, bag of words) for supervised approaches the Lesk algorithm disambiguates solely on the basis of dicionary information overlap between dictionary entry and context of word occurrence the most frequent sense and the Lesk algorithm are used as baselines for evaluation 34 / 56

Overview we have a huge number of classes (senses) need large hand-built resources: supervised approaches need large annotated corpora (unrealistic) dictionary methods need large dictionaries, which, even if available, often do not provide enough information alternatives: Minimally supervised WSD Unsupervised WSD both make use of unannotated data these approaches are not as successful as supervised approaches 35 / 56

Minimally supervised WSD: Bootstrapping for a given word, for example plant start with a small number of annotated examples (seeds) for each sense collect additional examples for each sense based on their similarity to annotated examples iterate 36 / 56

Bootstrapping: example plant (Yarowsky 1995) sense A: living entity; sense B: building first examples: those that appear with life (sense A) and manufacturing (sense B) Figure: Bootstrapping word senses. Figure 20.4 in Jurafsky & Martin. 37 / 56

Yarowsky 1995 Influential insights (used as heuristics in Yarowsky s algorithm): one sense per collocation life+plant = plant A manufacturing+plant = plant B one sense per discourse if a word appears multiple times in a text, probably all occurrences will bear the same sense also useful to enlarge datasets 38 / 56

Unsupervised WSD no previous knowledge no human-defined word senses simply group examples according to the similarity of the examples clustering and infer senses from that problem: hard to interpret and evaluate 39 / 56

Unsupervised WSD no previous knowledge no human-defined word senses simply group examples according to the similarity of the examples clustering and infer senses from that problem: hard to interpret and evaluate 39 / 56

Outline Overview 1 Overview 2 3 4 5 40 / 56

Interim summary WSD can be framed as a standard classification task training data, feature definition, classifier, evaluation supervised approaches most useful information: syntactic and lexical context (collocational features) words related to the different senses of a given word (bag of word features) words in dictionary (thesaurus, etc.) entries other approaches try to make use of unannotated data bootstrapping, unsupervised learning would be great, but not as successful as supervised approaches (and harder to interpret and work with) 41 / 56

Useful empirical facts skewed distribution of senses most frequent sense baseline heuristic when no other information is available BUT distribution varies with text/corpus! (cone in geometry textbook) one sense per collocation bass+player=bass 7 simple cues for sense classification (heuristic) one sense per discourse different occurences of a word in a given text tend to be used in the same sense heuristic for classification and for data gathering 42 / 56

Conceptual problems the task as currently defined does no allow for generalization over different words learning is word-specific number of classes = number of senses; equal to or greater than number of words! need training data for every sense of every word most words have low frequency (Zipf s law) no chance with unknown words this wouldn t be a problem if word sense alternation were like bank 1 bank 2 (homonymy)...... but many alternations are systematic! (regular polysemy, metonymy, metaphor) 43 / 56

Conceptual problems the task as currently defined does no allow for generalization over different words learning is word-specific number of classes = number of senses; equal to or greater than number of words! need training data for every sense of every word most words have low frequency (Zipf s law) no chance with unknown words this wouldn t be a problem if word sense alternation were like bank 1 bank 2 (homonymy)...... but many alternations are systematic! (regular polysemy, metonymy, metaphor) 43 / 56

Regular polysemy conversion bank (N): financial institution bank (V): put money in a bank same for sugar, hammer, tango, etc. (also derivation: -ize) adjectives (Boleda 2007) qualitative vs. relational: cara familiar ( familiar face ) vs. reunió familiar ( family meeting ) event-related vs. qualitative: fet sabut ( known fact ) vs. home sabut ( wise man ) 44 / 56

Regular polysemy conversion bank (N): financial institution bank (V): put money in a bank same for sugar, hammer, tango, etc. (also derivation: -ize) adjectives (Boleda 2007) qualitative vs. relational: cara familiar ( familiar face ) vs. reunió familiar ( family meeting ) event-related vs. qualitative: fet sabut ( known fact ) vs. home sabut ( wise man ) 44 / 56

Regular polysemy: mass/count animal/meat chicken 1 : animal; chicken 2 : meat lamb 1 : animal; lamb 2 : meat... portions/kinds: two beers two servings of beer two types of beer generally: thing/derived substance (grinding) After several lorries had run over the body, there was rabbit splattered all over the road. 45 / 56

Regular polysemy verb alternations causative/inchoative (Levin 1993) John broke the window The window broke Spanish psychological verbs Le preocupa la situación (Dative + Subject) Bruna no quiere preocuparla (subject + Accusative) 46 / 56

Contextual coercion / Logical metonymy (Also see course by Louise McNally.) object to eventuality (Pustejovsky 1995) Mary enjoyed the book. After three martinis, Kim felt much happier. adjectives (Pustejovsky 1995): event selection fast runner vs. fast typist vs. fast car 47 / 56

Metonymy Overview container/content He drank a bottle of whisky. Morphology again: He drank a bottleful of whisky. (-ful suffixation) fruit/plant olive, grapefruit,... Spanish: often tree masculine (olivo, naranjo), fruit feminine (oliva, naranja) figure/ground Kim painted the door Kim walked through the door 48 / 56

Metonymy Overview country names Location: I live in China. Government: The US and Lybia have agreed to work together to solve... Team (sports): England won last s year World Cup. more generally: institutions Barcelona applied for the Olympic Games. The banks won t give credits now. The newspapers criticized this policy. object/person The cello is playing badly. Not so regular: contextual metaphor: The ham sandwich wants his check. (Lakoff & Johnson 1980) 49 / 56

Metonymy Overview country names Location: I live in China. Government: The US and Lybia have agreed to work together to solve... Team (sports): England won last s year World Cup. more generally: institutions Barcelona applied for the Olympic Games. The banks won t give credits now. The newspapers criticized this policy. object/person The cello is playing badly. Not so regular: contextual metaphor: The ham sandwich wants his check. (Lakoff & Johnson 1980) 49 / 56

Metaphor Overview physical mental depart 1 : physical transfer; arrive 1 : physical transfer; go 1 : physical transfer depart 2 : mental transfer; arrive 2 : mental transfer; go 2 : mental transfer concrete abstract aigua clara ( clear water ) vs. estil clar ( clear style ) cabells negres ( black hair ) vs. humor negre ( black humour ) 50 / 56

Metaphor Overview physical mental depart 1 : physical transfer; arrive 1 : physical transfer; go 1 : physical transfer depart 2 : mental transfer; arrive 2 : mental transfer; go 2 : mental transfer concrete abstract aigua clara ( clear water ) vs. estil clar ( clear style ) cabells negres ( black hair ) vs. humor negre ( black humour ) 50 / 56

To sum up pervasive systematicity in sense alternations: regular polysemy, metonymy, metaphor productive We found a little, hairy wampimuk sleeping behind the tree (McDonald & Ramscar 2001) Wampimuk soup is delicious! inherent property of language analogical reasoning (psychology again) WSD as currently handled cannot capture these regularities theoretical and practical problem! 51 / 56

To sum up pervasive systematicity in sense alternations: regular polysemy, metonymy, metaphor productive We found a little, hairy wampimuk sleeping behind the tree (McDonald & Ramscar 2001) Wampimuk soup is delicious! inherent property of language analogical reasoning (psychology again) WSD as currently handled cannot capture these regularities theoretical and practical problem! 51 / 56

To sum up pervasive systematicity in sense alternations: regular polysemy, metonymy, metaphor productive We found a little, hairy wampimuk sleeping behind the tree (McDonald & Ramscar 2001) Wampimuk soup is delicious! inherent property of language analogical reasoning (psychology again) WSD as currently handled cannot capture these regularities theoretical and practical problem! 51 / 56

To sum up pervasive systematicity in sense alternations: regular polysemy, metonymy, metaphor productive We found a little, hairy wampimuk sleeping behind the tree (McDonald & Ramscar 2001) Wampimuk soup is delicious! inherent property of language analogical reasoning (psychology again) WSD as currently handled cannot capture these regularities theoretical and practical problem! 51 / 56

WSD and regularities: what one can do generalize on FEATURES e.g., jazz MUSIC-STYLE jazz, rock, blues,... provided some lexical resource is available that encodes this information He is a jazz bass player. I love bass solos in rock music. problem: when (how) to generalize? when to stop? 52 / 56

WSD and regularities: what one can do generalize on FEATURES e.g., jazz MUSIC-STYLE jazz, rock, blues,... provided some lexical resource is available that encodes this information He is a jazz bass player. I love bass solos in rock music. problem: when (how) to generalize? when to stop? 52 / 56

WSD and regularities: what one can do generalize on FEATURES e.g., jazz MUSIC-STYLE jazz, rock, blues,... provided some lexical resource is available that encodes this information He is a jazz bass player. I love bass solos in rock music. problem: when (how) to generalize? when to stop? 52 / 56

WSD and regularities: what one can do generalize on FEATURES e.g., jazz MUSIC-STYLE jazz, rock, blues,... provided some lexical resource is available that encodes this information He is a jazz bass player. I love bass solos in rock music. problem: when (how) to generalize? when to stop? 52 / 56

WSD and regularities: what would be desirable train on chicken and use the data for lamb, wampimuk,... Resources such as WordNet encode the meat/animal distinction: WordNet info for chicken: chicken 1 : the flesh of a chicken used for food. chicken 2 : a domesticated gallinaceous bird (hyponym). chicken 3 : a person who lacks confidence. chicken 4 : a foolhardy competition. WordNet info for lamb: lamb 1 : young sheep. lamb 2 : a person easily deceived or cheated. lamb 3 : a sweet innocent mild-mannered person. lamb 4 : the flesh of a young domestic sheep eaten as food. WHAT IS MISSING: link between chicken 2 and lamb 1, chicken 1 and lamb 4 (note other senses) 53 / 56

WSD and regularities: what would be desirable train on chicken and use the data for lamb, wampimuk,... Resources such as WordNet encode the meat/animal distinction: WordNet info for chicken: chicken 1 : the flesh of a chicken used for food. chicken 2 : a domesticated gallinaceous bird (hyponym). chicken 3 : a person who lacks confidence. chicken 4 : a foolhardy competition. WordNet info for lamb: lamb 1 : young sheep. lamb 2 : a person easily deceived or cheated. lamb 3 : a sweet innocent mild-mannered person. lamb 4 : the flesh of a young domestic sheep eaten as food. WHAT IS MISSING: link between chicken 2 and lamb 1, chicken 1 and lamb 4 (note other senses) 53 / 56

WSD and regularities: what would be desirable train on chicken and use the data for lamb, wampimuk,... Resources such as WordNet encode the meat/animal distinction: WordNet info for chicken: chicken 1 : the flesh of a chicken used for food. chicken 2 : a domesticated gallinaceous bird (hyponym). chicken 3 : a person who lacks confidence. chicken 4 : a foolhardy competition. WordNet info for lamb: lamb 1 : young sheep. lamb 2 : a person easily deceived or cheated. lamb 3 : a sweet innocent mild-mannered person. lamb 4 : the flesh of a young domestic sheep eaten as food. WHAT IS MISSING: link between chicken 2 and lamb 1, chicken 1 and lamb 4 (note other senses) 53 / 56

Word Sense Disambiguation Computational Lexical Semantics Gemma Boleda 1 Stefan Evert 2 1 Universitat Politècnica de Catalunya 2 University of Osnabrück ESSLLI. Bordeaux, France, July 2009. 54 / 56

Classifier example 1: Naive Bayes probabilistic classifier (related to HMMs) choosing the best sense amounts to choosing the most probable sense given the feature vector conditional probability BUT it is impossible to train it directly (too many feature combinations) 2 strategies: decomposing the probabilities (Bayes rules) easier to estimate making unrealistic assumption: words are independent ( Naive Bayes) training the classifier = estimating probabilities from the sense-tagged corpus 55 / 56

Classifier example 2: Decision Lists similar to decision trees (difference: only one condition) Rule Sense fish within window bass 4 striped bass bass 4 guitar within window bass 7 play/v bass bass 7 Figure: Decision List for word bass to learn a decision list classifier: generate and order tests according to the training data 56 / 56