Integrating Distributional Lexical Contrast into Word Embeddings for Antonym Synonym Distinction

Similar documents
Probabilistic Latent Semantic Analysis

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

arxiv: v1 [cs.cl] 2 Apr 2017

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

A Semantic Similarity Measure Based on Lexico-Syntactic Patterns

Concepts and Properties in Word Spaces

Leveraging Sentiment to Compute Word Similarity

Methods for the Qualitative Evaluation of Lexical Association Measures

Literal or idiomatic? Identifying the reading of single occurrences of German multiword expressions using word embeddings

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

On document relevance and lexical cohesion between query terms

arxiv: v1 [cs.cl] 20 Jul 2015

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Using dialogue context to improve parsing performance in dialogue systems

Determining the Semantic Orientation of Terms through Gloss Classification

A Re-examination of Lexical Association Measures

Mandarin Lexical Tone Recognition: The Gating Paradigm

Combining a Chinese Thesaurus with a Chinese Dictionary

LIM-LIG at SemEval-2017 Task1: Enhancing the Semantic Similarity for Arabic Sentences with Vectors Weighting

A Comparison of Two Text Representations for Sentiment Analysis

Constructing Parallel Corpus from Movie Subtitles

Autoencoder and selectional preference Aki-Juhani Kyröläinen, Juhani Luotolahti, Filip Ginter

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Handling Sparsity for Verb Noun MWE Token Classification

Vocabulary Usage and Intelligibility in Learner Language

I. INTRODUCTION. for conducting the research, the problems in teaching vocabulary, and the suitable

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Using Small Random Samples for the Manual Evaluation of Statistical Association Measures

Integrating Semantic Knowledge into Text Similarity and Information Retrieval

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

1. Introduction. 2. The OMBI database editor

A Domain Ontology Development Environment Using a MRD and Text Corpus

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Word Embedding Based Correlation Model for Question/Answer Matching

The Role of the Head in the Interpretation of English Deverbal Compounds

2.1 The Theory of Semantic Fields

Mercer County Schools

Probability and Statistics Curriculum Pacing Guide

Epping Elementary School Plan for Writing Instruction Fourth Grade

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Multilingual Sentiment and Subjectivity Analysis

Text-mining the Estonian National Electronic Health Record

Differential Evolutionary Algorithm Based on Multiple Vector Metrics for Semantic Similarity Assessment in Continuous Vector Space

Memory-based grammatical error correction

Emmaus Lutheran School English Language Arts Curriculum

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Python Machine Learning

A Case Study: News Classification Based on Term Frequency

Extended Similarity Test for the Evaluation of Semantic Similarity Functions

English Language and Applied Linguistics. Module Descriptions 2017/18

The Role of Semantic and Discourse Information in Learning the Structure of Surgical Procedures

Formulaic Language and Fluency: ESL Teaching Applications

BULATS A2 WORDLIST 2

Loughton School s curriculum evening. 28 th February 2017

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

A Bayesian Learning Approach to Concept-Based Document Classification

Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

Attributed Social Network Embedding

The Choice of Features for Classification of Verbs in Biomedical Texts

Assignment 1: Predicting Amazon Review Ratings

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Evaluating vector space models with canonical correlation analysis

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Calibration of Confidence Measures in Speech Recognition

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

STA 225: Introductory Statistics (CT)

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

TINE: A Metric to Assess MT Adequacy

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

arxiv: v1 [cs.lg] 3 May 2013

Chapter 4: Valence & Agreement CSLI Publications

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Re-evaluating the Role of Bleu in Machine Translation Research

Extracting Verb Expressions Implying Negative Opinions

Grade 6: Correlated to AGS Basic Math Skills

On-the-Fly Customization of Automated Essay Scoring

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Measurement. When Smaller Is Better. Activity:

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

National University of Singapore Faculty of Arts and Social Sciences Centre for Language Studies Academic Year 2014/2015 Semester 2

Comprehension Recognize plot features of fairy tales, folk tales, fables, and myths.

Linking Task: Identifying authors and book titles in verbose queries

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

12- A whirlwind tour of statistics

The Role of String Similarity Metrics in Ontology Alignment

Short Text Understanding Through Lexical-Semantic Analysis

CS 598 Natural Language Processing

Effect of Word Complexity on L2 Vocabulary Learning

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

A Graph Based Authorship Identification Approach

Transcription:

Integrating Distributional Lexical Contrast into Word Embeddings for Antonym Synonym Distinction Kim Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu Institute for Natural Language Processing (IMS) University of Stuttgart ACL-2016, Berlin

Overview 1. Introduction 1.1. Word Vector Representations 1.2. Antonym-Synonym Distinction Task 2. Contributions 2.1. Improving Weights of Feature Vectors 2.2. Distributional Lexical Contrast Embeddings Model 3. Experiments 4. Conclusion ACL-2016: Antonym-Synonym Distinction 2 / 20

1.1. Word Vector Representations 1.1.1. Distributional Semantic Model (DSM) A means to represent meaning vectors of words. DSM rely on the distributional hypothesis. (Harris, 1954) Words with similar distributions have related meanings. Each weighted feature can be: Co-occurrence frequency. Association measure: local mutual information (LMI) (Evert, 2005) 1.1.2. Word Embeddings Representing words as low-dimensional dense vectors. Words with similar distributions have similar vectors. ACL-2016: Antonym-Synonym Distinction 3 / 20

1.2. Antonym-Synonym Distinction Task Goal Distinguishing antonyms from synonyms. Problems Causes DSM tend to capture both antonyms (formal-informal) and synonyms (formal-conventional). Word embeddings represent vectors of both antonyms and synonyms as similar vectors. Antonymy and synonymy are paradigmatic relations. Antonyms and synonyms often occur in similar contexts. ACL-2016: Antonym-Synonym Distinction 4 / 20

Outline 1. Introduction 1.1. Word Vector Representations 1.2. Antonym-Synonym Distinction Task 2. Contributions 2.1. Improving Weights of Feature Vectors 2.2. Distributional Lexical Contrast Embeddings Model 3. Experiments 4. Conclusion ACL-2016: Antonym-Synonym Distinction 5 / 20

2.1. Improving Weights of Feature Vectors Goal: Solution: Improving the quality of weighted feature vectors. Strengthening most salient features in the vectors. Using the lexical contrast information of the target words and their contexts. Proposing the new weight for feature vectors. Representing words based on DSM with positive LMI. For each target word w: Determining the sets of antonyms A(w) and synonyms S(w). Determining the set of shared words W(f ) for each feature f. Computing the new weight (called weight SA ) as follows: ACL-2016: Antonym-Synonym Distinction 6 / 20

2.1. Improving Weights of Feature Vectors weight SA 1 (w, f ) = #(w,u) u W(f ) S(w) sim(w, u) 1 #(w,v) Average similarity to synonyms w A(w) v W(f ) S(w ) sim(w, v) Average similarity to antonyms ACL-2016: Antonym-Synonym Distinction 7 / 20

2.1. Improving Weights of Feature Vectors w = formal S(w) = {methodical, precise, conventional,...} f = conception f = issue A(w) = w = informal S(w ) = {irregular, unofficial, unconventional,...} f = rumor

2.1. Improving Weights of Feature Vectors w = formal S(w) = {methodical, precise, conventional,...} f = conception f = issue A(w) = w = informal S(w ) = {irregular, unofficial, unconventional,...} f = rumor weight SA (formal, conception) weight SA (formal, issue) 0 weight SA (formal, rumor) weight SA (w, f ) = 1 1 #(w,u) u W(f ) S(w) sim(w, u) #(w,v) w A(w) v W(f ) S(w ) sim(w, v) ACL-2016: Antonym-Synonym Distinction 8 / 20

2.2. Distributional Lexical Contrast Embeddings Model (dlce) Aims: Solution: Learning word embeddings. Moving synonyms closer to each other in space. Moving antonyms further away from each other in space. Integrating distributional lexical contrast into the transformation of Skip-gram model (Mikolov et al. 2013, Levy et al., 2014). Applying lexical contrast to every single context of the target word. The proposed objective function as follows: ACL-2016: Antonym-Synonym Distinction 9 / 20

w V c V 2.2. Distributional Lexical Contrast Embeddings Model (dlce) Distribution of target word and contexts Distribution of negative contexts { ( #(w, c) log σ(sim(w, c)) + k#(w)p 0 (c) log σ( sim(w, c)) ) 1 +( #(w,u) u W(c) S(w) sim(w, u) 1 #(w,v) v W(c) A(w) sim(w, v) ) } Distribution of synonymous pairs Distribution of antonymous pairs ACL-2016: Antonym-Synonym Distinction 10 / 20

Outline 1. Introduction 1.1. Word Vector Representations 1.2. Antonym-Synonym Distinction Task 2. Contributions 2.1. Improving Weights of Feature Vectors 2.2. Distributional Lexical Contrast Embeddings Model 3. Experiments 4. Conclusion ACL-2016: Antonym-Synonym Distinction 11 / 20

3. Experiments Evaluating weight SA on Antonym-Syntonym distinction task. Evaluating effects of dlce model: Antonym-Syntonym distinction task. Similarity task. ACL-2016: Antonym-Synonym Distinction 12 / 20

3.1. Antonym Synonym Distinction Corpus: ENCOW14A (Schäfer and Bildhauer, 2012) contains 14.5 billion tokens. Dataset: a gold standard resource of paradigmatic relation pairs (Roth and Schulte im Walde, 2014) Word Class Ant-pairs Syn-pairs Total Adjective 300 300 600 Noun 350 350 700 Verb 400 400 800 Using average precision (AP) to evaluate. Using box-plots to compare the cosine medians of antonymous vs. synonymous pairs. ACL-2016: Antonym-Synonym Distinction 13 / 20

3.1. Antonym Synonym Distinction AP evaluation results 1 : Adjectives Nouns Verbs ANT SYN ANT SYN ANT SYN LMI 0.46 0.56 0.42 0.60 0.42 0.62 weight SA 0.36 0.75 0.40 0.66 0.38 0.71 LMI + SVD 0.46 0.55 0.46 0.55 0.44 0.58 weight SA + SVD 0.36 0.76 0.40 0.66 0.38 0.70 1 χ 2, p <.001, p <.005, p <.05 ACL-2016: Antonym-Synonym Distinction 14 / 20

3.1. Antonym Synonym Distinction Results in box-plots: ADJ NOUN VERB 0.5 0.0 0.5 1.0 LMI WeightSA LMI_SVD WeightSA_SVD LMI WeightSA LMI_SVD WeightSA_SVD LMI WeightSA LMI_SVD WeightSA_SVD Cosine ANT SYN ACL-2016: Antonym-Synonym Distinction 15 / 20

3.2. Effects of dlce model 3.2.1. Antonym-Synonym Distinction: Dataset: the gold standard resource of paradigmatic relation pairs (Roth and Schulte im Walde, 2014). Using area under curve (AUC) to identify antonyms. Comparison Models: Skip-gram (SGNS), mlcm (Pham et al., 2015) Results: Adjectives Nouns Verbs SGNS 0.64 0.66 0.65 mlcm 0.85 0.69 0.71 dlce 0.90 0.72 0.81 ACL-2016: Antonym-Synonym Distinction 16 / 20

3.2. Effects of dlce model 3.2.2. Similarity Task: Dataset: SimLex-999 (Hill et al., 2015) Using Spearman correlation coefficient ρ to evaluate. Comparison Models: SGNS, mlcm. Results: SGNS mlcm dlce 0.38 0.51 0.59 ACL-2016: Antonym-Synonym Distinction 17 / 20

Outline 1. Introduction 1.1. Word Vector Representations 1.2. Antonym-Synonym Distinction Task 2. Contributions 2.1. Improving Weights of Feature Vectors 2.2. Distributional Lexical Contrast Embeddings Model 3. Experiments 4. Conclusion ACL-2016: Antonym-Synonym Distinction 18 / 20

4. Conclusion We have presented two methods to address the task of antonym-synonym distinction: Improving the quality of weighted feature vectors. Integrating distributional lexical contrast into word embeddings. The results from the experiments show that our approaches can model semantic similarity and distinguish between antonyms and synonyms. ACL-2016: Antonym-Synonym Distinction 19 / 20

Thank you! ACL-2016: Antonym-Synonym Distinction 20 / 20