Using WordNet to Supplement Corpus Statistics

Similar documents
Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Vocabulary Usage and Intelligibility in Learner Language

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The MEANING Multilingual Central Repository

CS Machine Learning

A Domain Ontology Development Environment Using a MRD and Text Corpus

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Linking Task: Identifying authors and book titles in verbose queries

Cross Language Information Retrieval

Probabilistic Latent Semantic Analysis

arxiv:cmp-lg/ v1 22 Aug 1994

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Leveraging Sentiment to Compute Word Similarity

Multi-Lingual Text Leveling

Compositional Semantics

Combining a Chinese Thesaurus with a Chinese Dictionary

On document relevance and lexical cohesion between query terms

The stages of event extraction

Using dialogue context to improve parsing performance in dialogue systems

Corpus Linguistics (L615)

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

California Department of Education English Language Development Standards for Grade 8

arxiv: v1 [cs.cl] 2 Apr 2017

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Probability and Statistics Curriculum Pacing Guide

A Bayesian Learning Approach to Concept-Based Document Classification

Multilingual Sentiment and Subjectivity Analysis

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Radius STEM Readiness TM

The Role of the Head in the Interpretation of English Deverbal Compounds

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

AQUA: An Ontology-Driven Question Answering System

What is Thinking (Cognition)?

Matching Similarity for Keyword-Based Clustering

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Python Machine Learning

Loughton School s curriculum evening. 28 th February 2017

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Math 1313 Section 2.1 Example 2: Given the following Linear Program, Determine the vertices of the feasible set. Subject to:

Simple Random Sample (SRS) & Voluntary Response Sample: Examples: A Voluntary Response Sample: Examples: Systematic Sample Best Used When

Comparison of network inference packages and methods for multiple networks inference

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

Word Sense Disambiguation

Comprehension Recognize plot features of fairy tales, folk tales, fables, and myths.

Speech Recognition at ICSI: Broadcast News and beyond

Learning Disability Functional Capacity Evaluation. Dear Doctor,

Ch VI- SENTENCE PATTERNS.

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

A Semantic Similarity Measure Based on Lexico-Syntactic Patterns

A Case Study: News Classification Based on Term Frequency

A Comparison of Two Text Representations for Sentiment Analysis

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

The taming of the data:

Proof Theory for Syntacticians

Short Text Understanding Through Lexical-Semantic Analysis

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio

Financing Education In Minnesota

2.1 The Theory of Semantic Fields

Evidence for Reliability, Validity and Learning Effectiveness

School Size and the Quality of Teaching and Learning

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

CSC200: Lecture 4. Allan Borodin

Collocations of Nouns: How to Present Verb-noun Collocations in a Monolingual Dictionary

Words come in categories

Exemplar Grade 9 Reading Test Questions

Lecture 1: Machine Learning Basics

Task Tolerance of MT Output in Integrated Text Processes

Syllabus for CHEM 4660 Introduction to Computational Chemistry Spring 2010

Switchboard Language Model Improvement with Conversational Data from Gigaword

Chapter 9 Banked gap-filling

Text-mining the Estonian National Electronic Health Record

COPING WITH LANGUAGE DATA SPARSITY: SEMANTIC HEAD MAPPING OF COMPOUND WORDS

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Development of Multistage Tests based on Teacher Ratings

Universiteit Leiden ICT in Business

Constructing Parallel Corpus from Movie Subtitles

Copyright Corwin 2015

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Word Segmentation of Off-line Handwritten Documents

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Chapter 4: Valence & Agreement CSLI Publications

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Characteristics of Collaborative Network Models. ed. by Line Gry Knudsen

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Writing a composition

(Includes a Detailed Analysis of Responses to Overall Satisfaction and Quality of Academic Advising Items) By Steve Chatman

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4

Cross-Lingual Text Categorization

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Distant Supervised Relation Extraction with Wikipedia and Freebase

Transcription:

Using WordNet to Supplement Corpus Statistics Rose Hoberman and Roni Rosenfeld November 14, 2002 Sphinx Lunch Nov 2002

Data, Statistics, and Sparsity Statistical approaches need large amounts of data Even with lots of data long tail of infrequent events (in 100MW over half of word types occur only once or twice) Problem: Poor statistical estimation of rare events Proposed Solution: Augment data with linguistic or semantic knowledge (e.g. dictionaries, thesauri, knowledge bases...) Sphinx Lunch Nov 2002 1

WordNet Large semantic network, groups words into synonym sets Links sets with a variety of linguistic and semantic relations Hand-built by linguists (theories of human lexical memory) Small sense-tagged corpus Sphinx Lunch Nov 2002 2

WordNet: Size and Shape Size: 110K synsets, lexicalized by 140K lexical entries 70% nouns 17% adjectives 10% verbs 3% adverbs Relations: 150K 60% hypernym/hyponym (IS-A) 30% similar to (adjectives), member of, part of, antonym 10%... Sphinx Lunch Nov 2002 3

WordNet Example: Paper IS-A... paper material, stuff substance, matter physical object entity composition, paper, report, theme essay writing... abstraction assignment... work... human act newspaper, paper print media... instrumentality artifact entity newspaper, paper, newspaper publisher publisher, publishing house firm, house, business firm business, concern enterprise organization social group group, grouping... Sphinx Lunch Nov 2002 4

This Talk Derive numerical word similarities from WordNet noun taxonomy. Examine usefulness of WordNet for two language modelling tasks: 1. Improve perplexity of bigram LM (trained on very little data) Combine bigram data of rare words with similar but more common proxies Use WN to find similar words 2. Find words which tend to co-occur within a sentence. Long distance correlations often semantic Use WN to find semantically related words Sphinx Lunch Nov 2002 5

Measuring Similarity in a Taxonomy Structure of taxonomy lends itself to calculating distances (or similarities) Simplest distance measure: length of shortest path (in edges) Problem: edges often span different semantic distances For example: plankton IS-A living thing rabbit IS-A leporid... IS-A mammal IS-A vertebrate IS-A... animal IS-A living thing Sphinx Lunch Nov 2002 6

Measuring Similarity using Information Content Resnik s method: use structure and corpus statistics Counts from corpus probability of each concept in the taxonomy information content of a concept. Similarity between concepts = the information content of their least common ancestor: sim(c 1, c 2 ) = log(p(lca(c 1, c 2 ))) Other similarity measures subsequently proposed Sphinx Lunch Nov 2002 7

Similarity between Words Each word has many senses (multiple nodes in taxonomy) Resnik s word similarity: max similarity between any of their senses Alternative definition: the weighted sum of sim(c 1, c 2 ), over all pairs of senses c 1 of w 1 and c 2 of w 2, where more frequent senses are weighted more heavily. For example: TURKEY vs. CHICKEN TURKEY vs. GREECE Sphinx Lunch Nov 2002 8

Improving Bigram Perplexity Combat sparseness define equivalence classes and pool data Automatic clustering, distributional similarity,... But... for rare words not enough info to cluster reliably Test whether bigram distributions of semantically similar words (according to WordNet) can be combined to reduce the bigram perplexity of rare words Sphinx Lunch Nov 2002 9

Combining Bigram Distributions Simple linear interpolation p s ( t) = (1 λ)p gt ( t) + λp ml ( s) Optimize lambda using 10-way cross-validation on training set Evaluate by comparing the perplexity on a new test set of p s ( t) with the baseline model p gt ( t). Sphinx Lunch Nov 2002 10

Ranking Proxies Score each candidate proxy s for target word t 1. WordNet similarity score: wsim max (t, s) 2. KL Divergence: D(p gt ( t) p ml ( s)) 3. Training set perplexity reduction of word s, i.e. the improvement in perplexity of p s ( t) compared to the 10-way cross-validated model. 4. Random: choose proxy randomly Choose highest ranked proxy (ignore actual scales of scores) Sphinx Lunch Nov 2002 11

Experiments 140MW of Broadcast News Test: 40MW reserved for testing Train: 9 random subsets of training data (1MW - 100MW) From nouns occurring in WordNet: 150 target words (occurred < 2 times in 1MW) 2000 candidate proxies (occurred > 50 times in 1MW) Sphinx Lunch Nov 2002 12

Methodology for each size training corpus: Find highest scoring proxy for each target word and each ranking method Target word: ASPIRATIONS best Proxies: SKILLS DREAMS DREAM/DREAMS HILL Create interpolated models and calculate perplexity reduction on test set Average perplexity reduction: weighted average of the perplexity reduction achieved for each target word, weighted by the frequency of each target word in the test set Sphinx Lunch Nov 2002 13

Percent PP reduction 7 6 5 4 3 2 1 WordNet Random KLdiv TrainPP 1 2 3 4 5 10 25 50 100 Data Size in Millions of Words Figure 1: Perplexity reduction as a function of training data size for four similarity measures. Sphinx Lunch Nov 2002 14

avg Percent PP reduction 2 1 0 1 2 3 4 random WNsim KLdiv cvpp 0 500 1000 1500 proxy rank Figure 2: Perplexity reduction as a function of proxy rank for four similarity measures. Sphinx Lunch Nov 2002 15

Error Analysis % Type of Relation Examples 45 Not an IS-A relation rug-arm, glove-scene 40 Missing or weak in WN aluminum-steel, bomb-shell 15 Present in WN blizzard-storm Table 1: Classification of best proxies for 150 target words. Each target word proxy with largest test PP reduction categorized relation Also a few topical relations (TESTAMENT-RELIGION) and domain specific relations (BEARD-MAN) Sphinx Lunch Nov 2002 16

Modelling Semantic Coherence N-grams only model short distances In real sentences content words come from same semantic domain Want to find long-distance correlations Incorporate semantic similarity constraint into exponential LM Sphinx Lunch Nov 2002 17

Modelling Semantic Coherence II Find words that co-occur within a sentence. Association statistics from data only reliable for high frequency words Long-distance associations are semantic Use WN? Sphinx Lunch Nov 2002 18

Experiments Cheating experiment to evaluate usefulness of WN Derive similarities from WN for only frequent words Compare to measure of association calculated from large amounts of data. (ground truth) Question: are these two measures correlated? Sphinx Lunch Nov 2002 19

Ground Truth 500,000 noun pairs Expected number of chance co-occurrences > 5 Word pair association: (Yule s statistic) Q = C 11 C 22 C 12 C 21 C 11 C 22 +C 12 C 21 Word 1 Yes Word 1 No Word 2 Yes C 11 C 12 Word 2 No C 21 C 22 Q ranges from -1 to 1 Sphinx Lunch Nov 2002 20

Sphinx Lunch Nov 2002 21

Figure 3: Looking for Correlation: WordNet similarity scores versus Q scores for 10,000 noun pairs Sphinx Lunch Nov 2002 22

Density 0.0 0.5 1.0 1.5 wsim > 6 All pairs 1.0 0.5 0.0 0.5 1.0 Q Score Only 0.1% of wordpairs have WordNet similarity scores above 5 and only 0.03% are above 6. Sphinx Lunch Nov 2002 23

precision 0.2 0.4 0.6 0.8 weighted maximum 0.00 0.01 0.02 0.03 0.04 0.05 recall Figure 4: Comparing effectiveness of two WordNet word similarity measures Sphinx Lunch Nov 2002 24

Relation Type Num Examples WN 277(163) part/member 87 (15) finger-hand, student-school phrase isa 65 (47) death tax IS-A tax coordinates 41 (31) house-senate, gas-oil morphology 30 (28) hospital-hospitals isa 28 (23) gun-weapon, cancer-disease antonyms 18 (13) majority-minority reciprocal 8 (6) actor-director, doctor-patient non-wn 461 topical 336 evidence-guilt, church-saint news and events 102 iraq-weapons, glove-theory other 23 END of the SPECTRUM Table 2: Error Analysis Sphinx Lunch Nov 2002 25

Conclusions? Very small bigram PP improvement when little data available Words with very high WN similarity do tend to co-occur within sentences, However recall is poor because most relations topical (but WN adding topical links) Limited types and quantities of relationships in WordNet compared to the spectrum of relationships found in real data WN word similarities weak source of knowledge for 2 tasks Sphinx Lunch Nov 2002 26

Possible Improvements, Other Directions? Interpolation weights should depend on... data AND WordNet score relative frequency of target and proxy word Improve WN similarity measure consider frequency of senses but don t dilute strong relations info content misleading for rare but high level concepts learn a function from large amounts of data? learn which parts of taxonomy are more reliable/complete? Consider alternative framework class word / word class / class word / word class provide WN with more constraints (from data) Sphinx Lunch Nov 2002 27