Exploring Vector Space Models to Predict the Compositionality of German Noun-Noun Compounds

Size: px

Start display at page:

Download "Exploring Vector Space Models to Predict the Compositionality of German Noun-Noun Compounds"

Gervase Hill
6 years ago
Views:

1 Exploring to Predict the Compositionality of German Noun-Noun Compounds Institut für Maschinelle Sprachverarbeitung (IMS) Universität Stuttgart, Germany *SEM, Atlanta June 13-14, 2013

2 Overview Motivation Motivation and Background Description of Compositionality Ratings & Data Sets Eval & Baselines Predicting Compound-Constituent Ratings POS Feature comparison Syntax Feature Comparison Predicting Compound Whole Ratings Conclusions

3 Motivation (VSMs): explore the notion of similarity between a set of target objects within a geometric setting (Turney and Pantel, 2010; Erk, 2012). Distributional Semantics: exploit the distributional hypothesis (Firth, 1957; Harris, 1968) to determine co-occurrence features for vector space models that best describe the words, phrases, sentences, etc. of interest. Salient Distributional Features in VSMs: general knowledge about useful features, but not across phenomena. Linguist - Computational Linguist loop Phenomenon: German noun-noun compounds, such as Feuerwerk fire works (Feuer fire + Werk opus ).

4 Hypotheses Motivation 1 Targets in the vector space models are nouns (compound nouns, modifier nouns, head nouns) adjectives and verbs provide most salient features, syntax-based outperforms window-based. 2 Contributions of modifier noun vs. head noun: distributional properties of heads are more salient than distributional properties of modifiers in predicting the degree of compositionality of the compounds.

5 German Noun-Noun Compounds German noun-noun compounds: combinations of two or more simplex nouns grammatical head is a noun (German: rightmost constituent) modifier is a noun Examples: Ahornblatt maple leaf, Obstkuchen fruit cake Degree of Compositionality: semantic relatedness between compound meaning and meanings of constituents Examples (T=transparent; O=opaque): TT Ahornblatt maple+leaf OO Löwenzahn lion+tooth dandelion TO Feuerzeug fire+stuff lighter OT Fliegenpilz fly+mushroom toadstool Dataset: 244 two-part noun-noun compounds

6 Compositionality Ratings Two collections: 1 Compound Constituent Ratings 2 Compound Whole Ratings

7 Compound Constituent Ratings Material: 450 concrete, depictable German noun compounds (We use a subset of these) Participants: 30 per compound Task: degree of compositionality of the compounds with respect to their first as well as their second constituent Scale: 1 (definitely opaque) to 7 (definitely transparent) Mode: paper+pen Data: rating means and standard deviation

8 Compound Whole Ratings Material: 244 noun-noun compounds (subset of above) Participants: per compound Task: degree of compositionality of the compounds as a whole Scale: 1 (definitely opaque) to 7 (definitely transparent) Mode: Amazon Mechanical Turk (AMT) Data: rating means and standard deviation

9 Compositionality Ratings: Examples Compounds Mean Ratings and Standard Deviations whole literal meanings of constituents whole modifier head Ahornblatt maple leaf maple leaf 6.03 ± ± ± 1.70 Löwenzahn dandelion lion tooth 1.66 ± ± ± 1.92 Fliegenpilz toadstool fly/bow tie mushroom 2.00 ± ± ± 0.63 Feuerzeug lighter fire stuff 4.58 ± ± ± 1.03

10 Compositionality Ratings: Distribution (1)

11 : Setup Goal: use VSM to identify salient distributional features to predict the degree of compositionality of the compounds Corpora: two German web corpora Feature Values: local mutual information (Evert, 2005) of co-occurrence counts (between target nouns and features): LMI = O log O E Measure of Relatedness: cosine degree of compositionality Evaluation: cosine against human ratings; Spearman Rank-Order Correlation Coefficient ρ (Siegel and Castellan, 1988)

12 Baseline and Upper Bound Upper Bound: correlations between human ratings: whole compound modifier; whole compound head addition/multiplication: whole compound modifier +/ compound head Baseline: random assignment of rating values [1,7] to compound modifier and compound head pairs; correlation of random values against human ratings addition/multiplication: whole rand(compound modifier) +/ rand(compound head)

13 Baseline and Upper Bound Function ρ Baseline Upper Bound modifier only head only addition multiplication

14 Corpus Data: German Web Corpora 1 sdewac (Faaß et al., 2010) 2 WebKo cleaned and parsed version of the German web corpus dewac created by the WaCky group (Baroni et al., 2009) corpus cleaning: removing duplicates; disregarding syntactically ill-formed sentences; etc. size: approx. 880 million words disadvantage: sentences in the corpus are sorted alphabetically window co-occurrence refers to x words to left and right BUT within the same sentence predecessor version of sdewac size: approx. 1.5 billion words disadvantage: less clean and not parsed

15 Window-based VSMs Hypothesis 1 (i): adjectives and verbs provide most salient features (for describing noun compounds) Task: compare parts-of-speech in predicting compositionality Setup: specification of corpus, part-of-speech and window size determine co-occurrence counts and calculate lmi values parts-of-speech: common nouns, adjectives, main verbs window sizes: 1, 2, 5, 10, 20 (, ) basis: lemmas; no punctuation

16 Window-based VSMs: Results NN > NN+ADJ+VV > VV > ADJ (significant) window sizes: 100 = > 10 > 5 > 2 > 1 WebKo > sdewac (significant; also with sentence-internal windows) best result: ρ = (WebKo, NN, window size: 20)

17 Hypothesis 1 (ii): syntax-based features outperform window-based features Task: compare the two co-occurrence conditions Setup: corpus choice: sdewac (parsed) specification of syntactic function determine co-occurrence counts and calculate lmi values syntactic functions (VS features): nouns in verb subcategorisation: transitive and intransitive subjects concatenation of both trans/intrans features (all subjects) direct objects PP objects noun-modifying adjectives noun-modifying and noun-modified prepositions

18 Syntax-based VSMs: Results

19 Syntax-based VSMs: Results window-based > syntax-based noun-modifying adjectives adjectives in window 20 verbs in window 20 > verb subcategorisation; best verb subcategorisation function: direct object abstracting over subject (in)transitivity > specific functions concatenation worse than the best individual functions

20 Role of Modifiers vs. Heads (1) Hypothesis 2: distributional properties of heads are more salient than distributional properties of modifiers Perspective (i): salient features for compound modifier vs. compound head pairs Setup: same as before (window-based and syntax-based) distinguish evaluation of 244 compound modifier predictions vs. 244 compound head predictions (instead of abstracting over the constituent type, using all 488 predictions)

21 Role of Modifiers vs. Heads (1): Results for Windows window-based: NN > NN+ADJ+VV > VV > ADJ (same as before) window sizes: 20 > 10 > 5 > 2 > 1 (same as before) small windows: compound head > compound modifier predictions larger windows: difference vanishes

22 Role of Modifiers vs. Heads (1): Results for Syntax syntax-based: window-based > syntax-based (as before) compound head > compound modifier predictions (exception transitive subjects) patterns with regard to function types vary (in comparison to previous models, and for modifiers vs. heads)

23 Role of Modifiers vs. Heads (2) Hypothesis 2: distributional properties of heads are more salient than distributional properties of modifiers Perspective (ii): contribution of modifiers vs. heads to compound meaning Setup: window-based, window 20, across parts-of-speech correlate only one type of compound constituent predictions with the compound whole ratings apply addition/multiplication correspondence to upper bound

24 Role of Modifiers vs. Heads (2): Results impact of distributional semantics: modifiers > heads multiplication modifiers only multiplication > addition

25 Summary Motivation Hypothesis 1 (i): against our intuition, not adjectives or verbs but nouns provided the most salient distributional information. Hypothesis 1 (ii): syntax-based predictions were all worse or same as predictions by the respective window-based parts-of-speech. Best Model: nouns within a 20-word window (ρ = )

26 Summary Motivation Hypothesis 2 (i): salient features to predict similarities between compound modifier vs. compound head pairs are different small windows: distributional similarity between compounds and heads > compounds and modifiers; but difference vanishes in larger contexts Hypothesis 2 (ii): influence of modifier meaning on compound meaning is stronger than influence of head meaning in human ratings and in VSMs Future Work: learn more about the semantic role of modifiers vs. heads in noun-noun compounds (as do Gagné and Spalding, 2009; 2011, among others).

27 Compositionality Ratings: Distribution (2)

28 Window-based VSMs: Results Context Windows only Sentence Internal sdewac, just Nouns vs. Sentence External Webko, just Nouns.

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification? Sabine Schulte im Walde Institut für Maschinelle Sprachverarbeitung Universität Stuttgart Seminar für Sprachwissenschaft,