Exploring Vector Space Models to Predict the Compositionality of German Noun-Noun Compounds

Similar documents
Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

CS 598 Natural Language Processing

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

A Semantic Similarity Measure Based on Lexico-Syntactic Patterns

Methods for the Qualitative Evaluation of Lexical Association Measures

Proof Theory for Syntacticians

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Role of the Head in the Interpretation of English Deverbal Compounds

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Online Updating of Word Representations for Part-of-Speech Tagging

THE VERB ARGUMENT BROWSER

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt

Natural Language Processing. George Konidaris

On document relevance and lexical cohesion between query terms

Leveraging Sentiment to Compute Word Similarity

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Using Small Random Samples for the Manual Evaluation of Statistical Association Measures

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Handling Sparsity for Verb Noun MWE Token Classification

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

LING 329 : MORPHOLOGY

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Context Free Grammars. Many slides from Michael Collins

Using dialogue context to improve parsing performance in dialogue systems

Translating Collocations for Use in Bilingual Lexicons

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

Linking Task: Identifying authors and book titles in verbose queries

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Collocations of Nouns: How to Present Verb-noun Collocations in a Monolingual Dictionary

AQUA: An Ontology-Driven Question Answering System

Development of the First LRs for Macedonian: Current Projects

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Probability and Statistics Curriculum Pacing Guide

Prediction of Maximal Projection for Semantic Role Labeling

A Case Study: News Classification Based on Term Frequency

Autoencoder and selectional preference Aki-Juhani Kyröläinen, Juhani Luotolahti, Filip Ginter

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

Construction Grammar. University of Jena.

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Beyond the Pipeline: Discrete Optimization in NLP

A Re-examination of Lexical Association Measures

A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books

Argument structure and theta roles

The Verbmobil Semantic Database. Humboldt{Univ. zu Berlin. Computerlinguistik. Abstract

Chapter 4: Valence & Agreement CSLI Publications

Concepts and Properties in Word Spaces

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Controlled vocabulary

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

A Comparison of Two Text Representations for Sentiment Analysis

STA 225: Introductory Statistics (CT)

Probabilistic Latent Semantic Analysis

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Integrating Semantic Knowledge into Text Similarity and Information Retrieval

Ensemble Technique Utilization for Indonesian Dependency Parser

Assignment 1: Predicting Amazon Review Ratings

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Vocabulary Usage and Intelligibility in Learner Language

Developing a TT-MCTAG for German with an RCG-based Parser

Full text of O L O W Science As Inquiry conference. Science as Inquiry

A Statistical Approach to the Semantics of Verb-Particles

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Proceedings of the 19th COLING, , 2002.

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

arxiv: v1 [cs.cl] 2 Apr 2017

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Procedia - Social and Behavioral Sciences 154 ( 2014 )

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Compositional Semantics

Grammars & Parsing, Part 1:

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

The Smart/Empire TIPSTER IR System

Lecture 2: Quantifiers and Approximation

The Language of Football England vs. Germany (working title) by Elmar Thalhammer. Abstract

Short Text Understanding Through Lexical-Semantic Analysis

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

An Interactive Intelligent Language Tutor Over The Internet

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

LIM-LIG at SemEval-2017 Task1: Enhancing the Semantic Similarity for Arabic Sentences with Vectors Weighting

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

12- A whirlwind tour of statistics

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Advanced Grammar in Use

BULATS A2 WORDLIST 2

Age Effects on Syntactic Control in. Second Language Learning

Derivational and Inflectional Morphemes in Pak-Pak Language

Differential Evolutionary Algorithm Based on Multiple Vector Metrics for Semantic Similarity Assessment in Continuous Vector Space

Word Segmentation of Off-line Handwritten Documents

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Transcription:

Exploring to Predict the Compositionality of German Noun-Noun Compounds Institut für Maschinelle Sprachverarbeitung (IMS) Universität Stuttgart, Germany *SEM, Atlanta June 13-14, 2013

Overview Motivation Motivation and Background Description of Compositionality Ratings & Data Sets Eval & Baselines Predicting Compound-Constituent Ratings POS Feature comparison Syntax Feature Comparison Predicting Compound Whole Ratings Conclusions

Motivation (VSMs): explore the notion of similarity between a set of target objects within a geometric setting (Turney and Pantel, 2010; Erk, 2012). Distributional Semantics: exploit the distributional hypothesis (Firth, 1957; Harris, 1968) to determine co-occurrence features for vector space models that best describe the words, phrases, sentences, etc. of interest. Salient Distributional Features in VSMs: general knowledge about useful features, but not across phenomena. Linguist - Computational Linguist loop Phenomenon: German noun-noun compounds, such as Feuerwerk fire works (Feuer fire + Werk opus ).

Hypotheses Motivation 1 Targets in the vector space models are nouns (compound nouns, modifier nouns, head nouns) adjectives and verbs provide most salient features, syntax-based outperforms window-based. 2 Contributions of modifier noun vs. head noun: distributional properties of heads are more salient than distributional properties of modifiers in predicting the degree of compositionality of the compounds.

German Noun-Noun Compounds German noun-noun compounds: combinations of two or more simplex nouns grammatical head is a noun (German: rightmost constituent) modifier is a noun Examples: Ahornblatt maple leaf, Obstkuchen fruit cake Degree of Compositionality: semantic relatedness between compound meaning and meanings of constituents Examples (T=transparent; O=opaque): TT Ahornblatt maple+leaf OO Löwenzahn lion+tooth dandelion TO Feuerzeug fire+stuff lighter OT Fliegenpilz fly+mushroom toadstool Dataset: 244 two-part noun-noun compounds

Compositionality Ratings Two collections: 1 Compound Constituent Ratings 2 Compound Whole Ratings

Compound Constituent Ratings Material: 450 concrete, depictable German noun compounds (We use a subset of these) Participants: 30 per compound Task: degree of compositionality of the compounds with respect to their first as well as their second constituent Scale: 1 (definitely opaque) to 7 (definitely transparent) Mode: paper+pen Data: rating means and standard deviation

Compound Whole Ratings Material: 244 noun-noun compounds (subset of above) Participants: 27 34 per compound Task: degree of compositionality of the compounds as a whole Scale: 1 (definitely opaque) to 7 (definitely transparent) Mode: Amazon Mechanical Turk (AMT) Data: rating means and standard deviation

Compositionality Ratings: Examples Compounds Mean Ratings and Standard Deviations whole literal meanings of constituents whole modifier head Ahornblatt maple leaf maple leaf 6.03 ± 1.49 5.64 ± 1.63 5.71 ± 1.70 Löwenzahn dandelion lion tooth 1.66 ± 1.54 2.10 ± 1.84 2.23 ± 1.92 Fliegenpilz toadstool fly/bow tie mushroom 2.00 ± 1.20 1.93 ± 1.28 6.55 ± 0.63 Feuerzeug lighter fire stuff 4.58 ± 1.75 5.87 ± 1.01 1.90 ± 1.03

Compositionality Ratings: Distribution (1)

: Setup Goal: use VSM to identify salient distributional features to predict the degree of compositionality of the compounds Corpora: two German web corpora Feature Values: local mutual information (Evert, 2005) of co-occurrence counts (between target nouns and features): LMI = O log O E Measure of Relatedness: cosine degree of compositionality Evaluation: cosine against human ratings; Spearman Rank-Order Correlation Coefficient ρ (Siegel and Castellan, 1988)

Baseline and Upper Bound Upper Bound: correlations between human ratings: whole compound modifier; whole compound head addition/multiplication: whole compound modifier +/ compound head Baseline: random assignment of rating values [1,7] to compound modifier and compound head pairs; correlation of random values against human ratings addition/multiplication: whole rand(compound modifier) +/ rand(compound head)

Baseline and Upper Bound Function ρ Baseline Upper Bound modifier only.0959.6002 head only.1019.1385 addition.1168.7687 multiplication.1079.7829

Corpus Data: German Web Corpora 1 sdewac (Faaß et al., 2010) 2 WebKo cleaned and parsed version of the German web corpus dewac created by the WaCky group (Baroni et al., 2009) corpus cleaning: removing duplicates; disregarding syntactically ill-formed sentences; etc. size: approx. 880 million words disadvantage: sentences in the corpus are sorted alphabetically window co-occurrence refers to x words to left and right BUT within the same sentence predecessor version of sdewac size: approx. 1.5 billion words disadvantage: less clean and not parsed

Window-based VSMs Hypothesis 1 (i): adjectives and verbs provide most salient features (for describing noun compounds) Task: compare parts-of-speech in predicting compositionality Setup: specification of corpus, part-of-speech and window size determine co-occurrence counts and calculate lmi values parts-of-speech: common nouns, adjectives, main verbs window sizes: 1, 2, 5, 10, 20 (,... 100) basis: lemmas; no punctuation

Window-based VSMs: Results NN > NN+ADJ+VV > VV > ADJ (significant) window sizes: 100 = 50 20 > 10 > 5 > 2 > 1 WebKo > sdewac (significant; also with sentence-internal windows) best result: ρ = 0.6497 (WebKo, NN, window size: 20)

Hypothesis 1 (ii): syntax-based features outperform window-based features Task: compare the two co-occurrence conditions Setup: corpus choice: sdewac (parsed) specification of syntactic function determine co-occurrence counts and calculate lmi values syntactic functions (VS features): nouns in verb subcategorisation: transitive and intransitive subjects concatenation of both trans/intrans features (all subjects) direct objects PP objects noun-modifying adjectives noun-modifying and noun-modified prepositions

Syntax-based VSMs: Results

Syntax-based VSMs: Results window-based > syntax-based noun-modifying adjectives adjectives in window 20 verbs in window 20 > verb subcategorisation; best verb subcategorisation function: direct object abstracting over subject (in)transitivity > specific functions concatenation worse than the best individual functions

Role of Modifiers vs. Heads (1) Hypothesis 2: distributional properties of heads are more salient than distributional properties of modifiers Perspective (i): salient features for compound modifier vs. compound head pairs Setup: same as before (window-based and syntax-based) distinguish evaluation of 244 compound modifier predictions vs. 244 compound head predictions (instead of abstracting over the constituent type, using all 488 predictions)

Role of Modifiers vs. Heads (1): Results for Windows window-based: NN > NN+ADJ+VV > VV > ADJ (same as before) window sizes: 20 > 10 > 5 > 2 > 1 (same as before) small windows: compound head > compound modifier predictions larger windows: difference vanishes

Role of Modifiers vs. Heads (1): Results for Syntax syntax-based: window-based > syntax-based (as before) compound head > compound modifier predictions (exception transitive subjects) patterns with regard to function types vary (in comparison to previous models, and for modifiers vs. heads)

Role of Modifiers vs. Heads (2) Hypothesis 2: distributional properties of heads are more salient than distributional properties of modifiers Perspective (ii): contribution of modifiers vs. heads to compound meaning Setup: window-based, window 20, across parts-of-speech correlate only one type of compound constituent predictions with the compound whole ratings apply addition/multiplication correspondence to upper bound

Role of Modifiers vs. Heads (2): Results impact of distributional semantics: modifiers > heads multiplication modifiers only multiplication > addition

Summary Motivation Hypothesis 1 (i): against our intuition, not adjectives or verbs but nouns provided the most salient distributional information. Hypothesis 1 (ii): syntax-based predictions were all worse or same as predictions by the respective window-based parts-of-speech. Best Model: nouns within a 20-word window (ρ = 0.6497)

Summary Motivation Hypothesis 2 (i): salient features to predict similarities between compound modifier vs. compound head pairs are different small windows: distributional similarity between compounds and heads > compounds and modifiers; but difference vanishes in larger contexts Hypothesis 2 (ii): influence of modifier meaning on compound meaning is stronger than influence of head meaning in human ratings and in VSMs Future Work: learn more about the semantic role of modifiers vs. heads in noun-noun compounds (as do Gagné and Spalding, 2009; 2011, among others).

Compositionality Ratings: Distribution (2)

Window-based VSMs: Results Context Windows only Sentence Internal sdewac, just Nouns vs. Sentence External Webko, just Nouns.