Vector Representations of Word Meaning in Context

Similar documents
Graph Alignment for Semi-Supervised Semantic Role Labeling

Handling Sparsity for Verb Noun MWE Token Classification

A Semantic Similarity Measure Based on Lexico-Syntactic Patterns

arxiv: v1 [cs.cl] 2 Apr 2017

Probabilistic Latent Semantic Analysis

Assignment 1: Predicting Amazon Review Ratings

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

TINE: A Metric to Assess MT Adequacy

The Role of the Head in the Interpretation of English Deverbal Compounds

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

Leveraging Sentiment to Compute Word Similarity

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Parsing of part-of-speech tagged Assamese Texts

A Case Study: News Classification Based on Term Frequency

Proof Theory for Syntacticians

AQUA: An Ontology-Driven Question Answering System

CS 598 Natural Language Processing

A Statistical Approach to the Semantics of Verb-Particles

A Bayesian Learning Approach to Concept-Based Document Classification

On document relevance and lexical cohesion between query terms

Prediction of Maximal Projection for Semantic Role Labeling

Compositional Semantics

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Using dialogue context to improve parsing performance in dialogue systems

Distant Supervised Relation Extraction with Wikipedia and Freebase

The Good Judgment Project: A large scale test of different methods of combining expert predictions

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

Comment-based Multi-View Clustering of Web 2.0 Items

Lecture 1: Machine Learning Basics

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

A Domain Ontology Development Environment Using a MRD and Text Corpus

BENCHMARK TREND COMPARISON REPORT:

A Comparison of Two Text Representations for Sentiment Analysis

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Linking Task: Identifying authors and book titles in verbose queries

Proceedings of the 19th COLING, , 2002.

Natural Language Processing. George Konidaris

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Online Updating of Word Representations for Part-of-Speech Tagging

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

The stages of event extraction

Probability and Statistics Curriculum Pacing Guide

A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books

A Graph Based Authorship Identification Approach

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Ensemble Technique Utilization for Indonesian Dependency Parser

Applications of memory-based natural language processing

Some Principles of Automated Natural Language Information Extraction

Georgetown University at TREC 2017 Dynamic Domain Track

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Switchboard Language Model Improvement with Conversational Data from Gigaword

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

Word Segmentation of Off-line Handwritten Documents

Unsupervised Learning of Narrative Schemas and their Participants

Memory-based grammatical error correction

Mandarin Lexical Tone Recognition: The Gating Paradigm

Autoencoder and selectional preference Aki-Juhani Kyröläinen, Juhani Luotolahti, Filip Ginter

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Vocabulary Usage and Intelligibility in Learner Language

(Sub)Gradient Descent

Beyond the Pipeline: Discrete Optimization in NLP

Using Web Searches on Important Words to Create Background Sets for LSI Classification

The MEANING Multilingual Central Repository

Multilingual Sentiment and Subjectivity Analysis

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Software Maintenance

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

arxiv: v2 [cs.cv] 3 Aug 2017

Developing a TT-MCTAG for German with an RCG-based Parser

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Search right and thou shalt find... Using Web Queries for Learner Error Detection

Learning From the Past with Experiment Databases

Cross Language Information Retrieval

Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure

THE VERB ARGUMENT BROWSER

Accuracy (%) # features

Extending Place Value with Whole Numbers to 1,000,000

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Corpus Linguistics (L615)

A Comparison of Charter Schools and Traditional Public Schools in Idaho

Accurate Unlexicalized Parsing for Modern Hebrew

Python Machine Learning

Columbia University at DUC 2004

arxiv:cmp-lg/ v1 22 Aug 1994

CS Machine Learning

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

The Ups and Downs of Preposition Error Detection in ESL Writing

On-the-Fly Customization of Automated Essay Scoring

The Choice of Features for Classification of Verbs in Biomedical Texts

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Term Weighting based on Document Revision History

Transcription:

Vector Representations of Word Meaning in Context Lea Frermann Universität des Saarlandes May 23, 2011 Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 1 / 52

Outline 1 Introduction 2 Combining Vectors (Mitchell and Lapata (2008)) Evaluation and Results 3 Modeling Vector Meaning in Context in a Structured Vector Space (Erk and Pado (2008)) Evaluation and Results 4 Syntactically Enriched Vector Models (Thater et al. (2010)) Evaluation and Results 5 Conclusion Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 2 / 52

Outline 1 Introduction 2 Combining Vectors (Mitchell and Lapata (2008)) Evaluation and Results 3 Modeling Vector Meaning in Context in a Structured Vector Space (Erk and Pado (2008)) Evaluation and Results 4 Syntactically Enriched Vector Models (Thater et al. (2010)) Evaluation and Results 5 Conclusion Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 3 / 52

Motivation Context and syntactic structure are essential for modelling semantic similarity. Example 1 (a) It was not the sales manager who hit the bottle that day, but the office worker with the serious drinking problem. (b) That day the office manager, who was drinking, hit the problem sales worker with a bottle, but it was not serious. Example 2 (a) catch a ball (b) catch a disease (c) attend a ball Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 4 / 52

Logical vs. Distributional Representation of Semantics Modelling word semantics in a distributional way: + Rich and easily available resources + High coverage and robust + Little hand-crafting necessary - Vectors represent the semantics of one word in isolation - Compositionality is hard to achieve Augment vector representations in a way that allows incorporation of context/syntactic information Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 5 / 52

Outline 1 Introduction 2 Combining Vectors (Mitchell and Lapata (2008)) Evaluation and Results 3 Modeling Vector Meaning in Context in a Structured Vector Space (Erk and Pado (2008)) Evaluation and Results 4 Syntactically Enriched Vector Models (Thater et al. (2010)) Evaluation and Results 5 Conclusion Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 6 / 52

Vector Representation of Word-level Semantics animal stable village gallop jokey horse 0 6 2 10 4 = u run 1 8 4 4 0 = v Vector Dimensions: Co-occurring words Values: Co-occurrence frequencies Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 7 / 52

Vector Composition Define a set of possible models: p = resulting vector p= f(u,v,r,k) f = function which combines the two vectors (addition, multiplication, combination of both) u,v = vectors representing individual words R = syntactic relation between words represented by u,v K = additional knowledge Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 8 / 52

Vector Representation of Word-level Semantics Fix the relation R Ignore additional knowledge K Independence Assumption: Only the i th component of u/v influences the i th component of p. p i = u i + v i p i = u i v i animal stable village gallop jokey horse 0 6 2 10 4 = u run 1 8 4 4 0 = v Additive Model: p = [1 14 6 14 4] Multiplicative Model: p = [0 48 8 40 0] Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 9 / 52

Vector Representation of Word-level Semantics Loosen symmetry assumption in Introduce weights Semantically important words can have higher influence p i = αn i + βv i Optimized weights: α = 20 and β = 80 n = noun vector and v = verb vector Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 10 / 52

Vector Representation of Word-level Semantics Corresponds to the model introduced in Kintsch(2001) Re-introduce additional knowledge K (d) = vectors of n distributional neighbors of the predicate Makes the additional model sensitive to syntactic structure p = u + v + d Kintsch s optimal parameters: m most similar neighbors to the predicate = 20 from m, select k most similar neighbors to its argument = 1 Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 11 / 52

Vector Representation of Word-level Semantics Combine additional and multiplicative models Avoids the multiplication-by-zero problem p i = αn i + βv i + γn i v i Optimized weights: α = 0 and β = 95 and γ = 5 n = noun vector and v = verb vector Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 12 / 52

Evaluation How is the verb s meaning influenced in the context of its subject? Measure similarity of reference verb relative to landmarks Landmark = Synonym of the reference verb in context of the given subject Chosen to be as dissimilar as possible according to WordNet similarity Noun Reference High Low The fire glowed burned beamed The face glowed beamed burned The child strayed roamed digressed The discussion strayed digressed roamed The sales slumped declined slouched The shoulders slumped slouched declined Figure: Example Stimuli with High and Low similarity landmarks. Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 13 / 52

Evaluation Pretests Compile a list of intransitive verbs from CELEX Extract all verb-subject pairs that occur > 50 times in the British National Corpus Pair these verbs with two landmarks Pick the subset of verbs with least variation in human similarity ratings Result: 15 verbs x 4 nouns x 2 landmarks = 120 sentences Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 14 / 52

Evaluation Experiments Humans are shown reference sentence and landmark Rate similarity on a scale from 1-7 Significant correlation Inter-human agreement ρ = 0.4 Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 15 / 52

Evaluation Model Parameters 5 context words on either side of the reference verb 2000 most frequent context words as vector components Vector values: p(contextword TargetWord) p(contextword) Cosine similarity for vector comparison Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 16 / 52

Evaluation Results I (a) Human ratings for High and Low similarity items (b) Multiplication Model ratings for High and Low similarity items Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 17 / 52

Evaluation Results II Model High Low ρ Noncomp 0.27 0.26 0.08** Add 0.59 0.59 0.04* WeightAdd 0.35 0.34 0.09** Kintsch 0.47 0.45 0.09** Multiply 0.42 0.28 0.17** Combined 0.38 0.28 0.19** UpperBound 4.94 3.25 0.40** Figure: Model means for High and Low similarity items and correlation coefficients with human judgments (*: p<0.05, **: p< 0.01) Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 18 / 52

Conclusion Component-wise vector multiplication outperforms vector addition Basic representation of word meaning as syntax-free bag-of-words-based vectors Their actual instantiations of models are insensitive to syntactic relations and word order Future Work: Include more linguistic information Evaluation on larger and more realistic data sets Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 19 / 52

Outline 1 Introduction 2 Combining Vectors (Mitchell and Lapata (2008)) Evaluation and Results 3 Modeling Vector Meaning in Context in a Structured Vector Space (Erk and Pado (2008)) Evaluation and Results 4 Syntactically Enriched Vector Models (Thater et al. (2010)) Evaluation and Results 5 Conclusion Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 20 / 52

General Idea Problem 1: Lack of syntactic information Problem 2: Scaling up A vector with fixed dimensionality can encode a fixed amount of information There is no limit on sentence length Construct a structured vector space, containing a word s meaning as well as its selectional preferences Meaning of word a in context of word b = combination of a with b s selectional preferences Re-introduce additional knowledge K into the models! Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 21 / 52

Representing Lemma Meaning Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 22 / 52

Representing Lemma Meaning Represent each word w as a combination of vectors in vector space D: a) One vector modeling the lexical meaning (v) b) A set of vectors modeling w s selectional preferences R : R D R 1 : R D w = (v, R, R 1 ) Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 22 / 52

Selectional Preferences Selectional Preference of word b and relation r = centroid of seen filler vectors v a R b (r) SELPREF = f (a, r, b) v a a:f (a,r,b)>0 f(a,r,b) = frequency of a occurring in relation r to b in the British National Corpus Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 23 / 52

Two Variations Alleviate noise caused by infrequent filler vectors R b (r) SELPREF-CUT = f (a, r, b) v a a:f (a,r,b)>θ Alleviate noise caused by low-valued vector dimensions R b (r) SELPREF-POW =< v n 1,..., v n m > Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 24 / 52

Computing Meaning in Context Verb meaning combined with the centroid of the vectors of the verbs to which the noun can stand in an object relation Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 25 / 52

Computing Meaning in Context a = (v a R 1 b (r), R a r, Ra 1 ) b = (v b Ra (r), R b, R 1 b r) (a, b ) = vector representing meaning of word a = (v a, R a, Ra 1 ) in the context of word b = (v b, R b, R 1 b ) r R = relation which links a to b Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 26 / 52

Vector Spaces 1 Bag-of-Words space (BOW) Co-occurrence frequencies of target and context within a context window of 10 (Mitchell and Lapata) 2 Dependency-based space (SYN) Target and context word must be linked by a valid dependency path Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 27 / 52

Evaluation I Results: Part 1 Model high low ρ BOW space Target only 0.32 0.32 0.0 Selpref only 0.46 0.40 0.06** M&L 0.25 0.15 0.20** SELPREF 0.32 0.26 0.12** SELPREF-CUT, θ = 10 0.31 0.24 0.11** SELPREF-POW, n = 20 0.11 0.03 0.27** Upper bound 0.4 SYN space Target only 0.20 0.20 0.08** Selpref only 0.27 0.21 0.16** M&L 0.13 0.06 0.24** SELPREF 0.22 0.16 0.13** SELPREF-CUT, θ = 10 0.20 0.13 0.13** SELPREF-POW, n=30 0.08 0.04 0.22** Upper bound 0.4 Figure: Mean cosine similarity for High and Low similarity items and correlation coefficients with human judgments (**: p< 0.01) Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 28 / 52

Evaluation I Results: Part 2 Model lex.vector subj 1 vs. obj 1 SELPREF 0.23 (0.09) 0.88 (0.07) SELPREF-CUT (10) 0.20 (0.10) 0.72 (0.18) SELPREF-POW (30) 0.03 (0.08) 0.52 (0.48) Figure: Average similarity (and standard deviation); cosine similarity in SYN space Column 1: To what extent does the difference in method (combination with words lexical vectors vs. selpref vectors) translate to a difference in predictions? Column 2: Does syntax-aware vector combination make a difference? Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 29 / 52

Evaluation II Settings Paraphrase ranking for a broader range of constructions Data: SemEval 1 lexical substitution data set 10 instances of each of 200 target words in sentential contexts Contextually appropriate paraphrases for each instance; rated by humans Subset of constructions used for evaluation: (a) target intransitive verbs with noun subjects (b) target transitive verbs with noun objects (c) target nouns occurring as objects of verbs Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 30 / 52

Evaluation II Settings Rank paraphrases on the basis of their cosine-similarity to: SELPREF-POW (30) V-SUBJ: verb & noun s subj 1 preferences V-OBJ: verb & noun s obj 1 preferences N-OBJ: noun & verb s obj preferences Mitchell and Lapata Direct noun-verb combination Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 31 / 52

Evaluation II Settings Out of ten evaluation metric: P 10 = 1/ I i s M i G i f (s, i) s G i f (s, i) G i = Gold Parse for item i M i = model s top ten paraphrases for i f(s,i) = frequency of s as paraphrase for i Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 32 / 52

Evaluation II Results Model V-SUBJ V-OBJ N-OBJ Target only 47.9 47.4 49.6 Selpref only 54.8 51.4 55.0 M&L 50.3 52.2 53.4 SELPREF-POW, n=30 63.1 55.8 56.9 Knowledge about a single context word (although not necessarily informative) can already lead to significant improvement Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 33 / 52

Conclusion: Word Meaning in Context A model of word meaning and selectional preferences in a structured vector space Outperforms the bag-of-words model of Mitchell and Lapata Evaluation on a broader range of relations and realistic paraphrase candidates Future work: Integrating information from multiple relations (eg. both Subject and Object) Application of models to more complex NLP problems Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 34 / 52

Outline 1 Introduction 2 Combining Vectors (Mitchell and Lapata (2008)) Evaluation and Results 3 Modeling Vector Meaning in Context in a Structured Vector Space (Erk and Pado (2008)) Evaluation and Results 4 Syntactically Enriched Vector Models (Thater et al. (2010)) Evaluation and Results 5 Conclusion Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 35 / 52

Basic idea Assumes richer internal structure of vector representations Model relation-specific co-occurrence frequencies Use syntactic second-order vector representations Reduces data sparseness caused by use of syntax Makes vector transformations possible, which avoids complementary information in vectors for different parts of speech Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 36 / 52

1st-Order Context Vectors [w] = ω(w, r, w ) e r,w r R,w W In vector space V 1 { e r,w r R, w W } [knowledge] = < 5 (OBJ 1,gain), 2 (CONJ 1,skill), 3 (OBJ 1,acquire),... > Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 37 / 52

2nd-Order Context Vectors I All words that can be reached in the co-occurrence graph with 2 steps Dimensions = (r,w,r,w ), generalized to (r,r,w ) Vectors contain paths of the form (r, r 1, w ) relate a word to other words that are possible substitution candidates If r = OBJ and r = OBJ 1 then the coefficients of e r,r,w in [[w]] characterize the distribution of verbs w sharing objects with w. Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 38 / 52

2nd-Order Context Vectors II ( [[w]] = ω(w, r, w ) ω(w, r, w )) e r,r,w r R,w W w W In Vector space V 2 { e r,r,w r, r R, w W } [[Acquire]] = < 15 (OBJ,OBJ 1,gain), 6 (OBJ,CONJ 1,skill), 42 (OBJ,OBJ 1,purchase),... > Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 39 / 52

Combining Context Vectors [[w r:w ]] = [[w]]xl r ([w ]) [[acquire]] < 15 (OBJ,OBJ 1,gain), 6 (OBJ,CONJ 1,skill), 42 (OBJ,OBJ 1,purchase),... > L r ([knowledge]) < 5 (OBJ 1,gain), 2 (CONJ 1,skill), 3 (CONJ 1,skill),... > [[acquire OBJ:knowledge ]] < 75 (OBJ,OBJ 1,gain), 12 (OBJ,CONJ 1,skill), 0 (OBJ,OBJ 1,purchase),... > Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 40 / 52

Contextualization of multiple vectors To contextualize multiple words, take the sum of pairwise contextualizations [[w r1 :w 1,...,r n:w n ]] = n [[w rk :w k ]] k=1 Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 41 / 52

Vector Space Obtain dependency trees from the parsed English Gigaword corpus (Stanford parser) Obtain 3.9 mio dependency triples Compute the vector space from a subset, exceeding a threshold in pmi and frequency of occurrence Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 42 / 52

Evaluation I Procedure Sentence Teacher education students will acquire the knowledge and skills required to [...] Paraphrases gain 4; amass 1; receive 1 Compare contextually constrained 2 nd order vector of the target verb to unconstrained 2 nd order vectors of the paraphrase candidates: [[acquire SUBJ:student,OBJ:knowledge ]] vs. [[gain]], [[amass]], [[receive]],... Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 43 / 52

Evaluation I Metrics 1 Out of ten (P 10 ) 2 Generalized Average Precision GAP = n i=1 I (x i )p i R i=1 I (y i )y i xi = the weight of i th item in the gold standard, or 0 if it does not appear I (x i ) = 1 if x i > 0, 0 otherwise y i = average weight of the ranked gold standard list y 1,..., y i pi = i k=1 x k i Rewards the correct order of a ranked list Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 44 / 52

Evaluation I Results Model GAP P 10 Random baseline 26.03 54.25 E&P (add, object) 29.93 66.20 E&P (min, subject & object) 32.22 64.86 1 st order contextualized 36.09 59.35 2 nd order uncontextualized 37.65 66.32 Full model 45.94 73.11 Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 45 / 52

Evaluation II Rank WordNet senses of a word w in context Word sense = centroid of the second-order vectors of the synset members + centroid of the sense s hypernyms scaled down by factor 10 Compare contextually constrained 2 nd order vector of the target verb to unconstrained 2 nd order vectors of the paraphrase Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 46 / 52

Evaluation II Results Word Present paper WN-Freq Combined ask 0.344 0.369 0.431 add 0.256 0.164 0.270 win 0.236 0.343 0.381 average 0.279 0.291 0.361 Figure: Correlation of model predictions and human ratings (Spearman s ρ) ; Upper Bound: 0.544 Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 47 / 52

Conclusion A model for adapting vector representations of words according to their context Detailed syntactic information through combinations of 1st and 2nd order vectors Outperforms state of the art systems and improves weakly supervised word sense assignment Future work: Generalization to larger syntactic contexts by recursive integration of information Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 48 / 52

Outline 1 Introduction 2 Combining Vectors (Mitchell and Lapata (2008)) Evaluation and Results 3 Modeling Vector Meaning in Context in a Structured Vector Space (Erk and Pado (2008)) Evaluation and Results 4 Syntactically Enriched Vector Models (Thater et al. (2010)) Evaluation and Results 5 Conclusion Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 49 / 52

Conclusion Syntactic and contextual information is essential for vector representations of word meaning Multiplicative vector combination results in the most accurate models Context as vector representations of a word s selectional preferences for each relation Context as interfering 1st and 2nd order context vectors of words Evaluation on word sense similarity, paraphrase ranking and word sense ranking Future work: Scale up models to allow for more contextual information Scale up models to adapt them to more complex NLP applications Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 50 / 52

Thank you for your attention! Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 51 / 52

Bibliography Katrin Erk and Sebastian Padó. A structured vector space model for word meaning in context. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 08, pages 897 906, Stroudsburg, PA, USA, 2008. Association for Computational Linguistics. Jeff Mitchell and Mirella Lapata. Vector-based models of semantic composition. In In Proceedings of ACL-08: HLT, pages 236 244, 2008. Stefan Thater, Hagen Fürstenau, and Manfred Pinkal. Contextualizing semantic representations using syntactically enriched vector models. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 10, pages 948 957, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics. Lea Frermann (Universität des Saarlandes) Vector Representation of Word Semantics May 23, 2011 52 / 52