Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

Similar documents
The Choice of Features for Classification of Verbs in Biomedical Texts

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Probabilistic Latent Semantic Analysis

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

The stages of event extraction

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Chapter 4: Valence & Agreement CSLI Publications

Leveraging Sentiment to Compute Word Similarity

On document relevance and lexical cohesion between query terms

A Case Study: News Classification Based on Term Frequency

CS 598 Natural Language Processing

Argument structure and theta roles

Natural Language Processing. George Konidaris

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Prediction of Maximal Projection for Semantic Role Labeling

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

The MEANING Multilingual Central Repository

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Words come in categories

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Multilingual Sentiment and Subjectivity Analysis

THE VERB ARGUMENT BROWSER

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Parsing of part-of-speech tagged Assamese Texts

A Domain Ontology Development Environment Using a MRD and Text Corpus

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Graph Alignment for Semi-Supervised Semantic Role Labeling

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Using dialogue context to improve parsing performance in dialogue systems

Developing a TT-MCTAG for German with an RCG-based Parser

Speech Recognition at ICSI: Broadcast News and beyond

Probability and Statistics Curriculum Pacing Guide

AQUA: An Ontology-Driven Question Answering System

Cross Language Information Retrieval

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

A Bayesian Learning Approach to Concept-Based Document Classification

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Lexical category induction using lexically-specific templates

Lecture 2: Quantifiers and Approximation

Control and Boundedness

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Formulaic Language and Fluency: ESL Teaching Applications

Handling Sparsity for Verb Noun MWE Token Classification

English Language and Applied Linguistics. Module Descriptions 2017/18

Bigrams in registers, domains, and varieties: a bigram gravity approach to the homogeneity of corpora

Vocabulary Usage and Intelligibility in Learner Language

An Introduction to the Minimalist Program

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

An Interactive Intelligent Language Tutor Over The Internet

cmp-lg/ Jul 1995

Hindi-Urdu Phrase Structure Annotation

The Smart/Empire TIPSTER IR System

arxiv: v1 [cs.cl] 2 Apr 2017

Proof Theory for Syntacticians

LING 329 : MORPHOLOGY

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Proceedings of the 19th COLING, , 2002.

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Development of the First LRs for Macedonian: Current Projects

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

CS Machine Learning

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

A corpus-based approach to the acquisition of collocational prepositional phrases

Loughton School s curriculum evening. 28 th February 2017

A Computational Evaluation of Case-Assignment Algorithms

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

South Carolina English Language Arts

Applications of memory-based natural language processing

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Evidence for Reliability, Validity and Learning Effectiveness

Language acquisition: acquiring some aspects of syntax.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

BULATS A2 WORDLIST 2

Corpus Linguistics (L615)

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Methods for the Qualitative Evaluation of Lexical Association Measures

Using Small Random Samples for the Manual Evaluation of Statistical Association Measures

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

The taming of the data:

Learning Computational Grammars

A Bootstrapping Model of Frequency and Context Effects in Word Learning

AN ANALYSIS OF GRAMMTICAL ERRORS MADE BY THE SECOND YEAR STUDENTS OF SMAN 5 PADANG IN WRITING PAST EXPERIENCES

Lecture 1: Machine Learning Basics

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Multi-Lingual Text Leveling

Integrating Semantic Knowledge into Text Similarity and Information Retrieval

2.1 The Theory of Semantic Fields

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Transcription:

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification? Sabine Schulte im Walde Institut für Maschinelle Sprachverarbeitung Universität Stuttgart Seminar für Sprachwissenschaft, Universität Tübingen November 13, 2006

Semantic Verb Classifications

Examples: Semantic Verb Classifications Various instantiations of semantic similarity, e.g.» syntax-semantics alternation behaviour (Levin, 1993): buy, catch, earn, find, steal,... (obtaining:get verbs with benefactive alternation)» synonymy (WordNet): buy, purchase (sub-class of get/acquire verbs)» situation-based agreement (FrameNet): buy, purchase (commerce_buy) inherits from acquire, gain, get, obtain, procure, secure (getting); commercial transaction with buyer, goods, etc. Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 3

Creation of Semantic Verb Classes Resource-intensive vs. automatic methods Classification and clustering parameters: verbs, classes, algorithm, features, etc. Features model semantic similarity of interest Example of automatic method:» Merlo & Stevenson (CL Journal, 2001): classify 60 English verbs which alternate between intransitive and transitive usage into three classes; features model syntactic frame alternation proportions and heuristics for semantic role assignment Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 4

Semantic Verb Classes: Features Features for larger-scale classifications with similarity at the syntax-semantics interface: behaviour Potentially salient features:» syntactic frames» prepositional phrases» argument role fillers» adverbial adjuncts, etc. Granularity of features Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 5

Human Associations and Semantic Verb Classifications

Associations: Guide to Feature Selection Basis: semantic associates, concepts spontaneously called to mind by a stimulus word Idea: human associations to identify salient features Assumptions:» associations reflect linguistic and conceptual features and therefore model verb meaning aspects» theory-independent» variety of semantic verb relations» guidance to feature selection Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 7

Goals Insights into the usefulness of standard feature types in verb clustering (e.g., direct object) Exploring additional feature types, e.g., assessment of low-level window co-occurrence vs. higher-order syntactic frame fillers Variation of corpus-based features by corpus frequency Are the same types of features salient for different types of semantic verb classes? Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 8

Procedure 1. Collection of human verb associations 2. Association-based verb classes (assoc-classes) 3. Validation against GermaNet and FrameNet 4. Analysis of empirical properties of verb associations and transfer of insights to the selection of features types 5. Hierarchical clustering with corpus-based features (corpus-classes) 6. Comparison of corpus-classes against assoc-classes 7. Evaluation of goals Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 9

Human Verb Associations: Collection and Analysis Joint work with Alissa Melinger and Katrin Erk.

Web Experiment: Material 330 German verbs Variety of semantic verb classes, possible ambiguity:» self-motion: gehen walk, schwimmen swim» cause: verbrennen burn, reduzieren reduce» experiencing: lachen laugh, überraschen surprise» communication: erzählen tell, klagen complain» body: schlafen sleep, abnehmen lose weight Variety of frequency ranges (1 < freq < 71,604) Random distribution: 6 data sets à 55 verbs, balanced for class affiliation and frequency ranges Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 11

Web Experiment: Procedure schneien kalt rodeln Schneemann weiß dämmern Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 12

Web Experiment: Data 299 accepted data files Participants per data set: between 44 and 54 Number of trials: 16,445 Number of associations per target verb: range 0-16, average: 5.16 Responses: 79,480 tokens for 39,254 types Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 13

Quantification over Association Types klagen complain, moan, sue Gericht court jammern moan weinen cry Anwalt lawyer Richter judge Klage complaint, lawsuit Leid suffering Trauer mourning Klagemauer Wailing Wall laut noisy 19 18 13 11 9 7 6 6 5 5 Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 14

Linguistic Analyses of Experiment Data Preference for morpho-syntactic category of responses? distinguish major parts-of-speech: nouns, verbs, adjectivs, adverbs Typical argument holders of verb valency? investigate linguistic functions realised by nouns: empirical grammar model Common appearance in corpus data? determine co-occurrence of target and reponse: German newspaper corpus, 200 million words Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 15

Excursus: Statistical Grammar Model Head-lexicalised probabilistic context-free grammar (Charniak, 1997; Carroll and Rooth, 1998) 35 million words of German newspaper corpora Unsupervised training by EM-Algorithm (Baum, 1972) Robust statistical parser LoPar (Schmid, 2000) Corpus-based quantitative lexical information: word frequencies, linguistic functions, head-head relations Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 16

Morpho-Syntactic Distribution V N ADJ ADV Freq Prob 19.863 25 48.905 62 8.510 11 1.268 2 TOKEN Freq Prob 9.317 24 23.524 61 4.983 13 802 2 TYPES Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 17

Syntax-Semantic Functions of Nouns Source: statistical grammar model Verb valency:» 38 syntactic subcategorisation frames» plus PP information (case+preposition) 178 frames» subcategorised nouns Example: backen bake» frames: NP nom NP nom NP acc...» filler examples for NP nom [NP acc ]: Brot bread Kuchen cake Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 18

Syntax-Semantic Functions: Analysis Look up syntactic relationships between verb and nouns Typical conceptual roles which speakers have in mind Example: Kuchen (45) Brot (18) [NP nom ] = 40.5 Plätzchen (10) backen Bäcker (8) [NP nom ] NP acc = 9 Brötchen (8) Pizza (3) NP nom [NP acc ] = 43.5 Mutter (1) Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 19

Functions: Distributions Function S V S V AO S S V DO S V PP S V AO AO S V AO DO S V AO PP S V DO DO S V AO DO PP S V PP:in Dat Unknown noun Unknown function TOKEN 1,892 1,054 291 608 3,239 840 692 270 476 487 10,663 24,536 4 2 1 1 7 2 1 1 1 1 22 50 Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 20

Window Co-Occurrence across POS Corpus data: 200 million word newspaper text Window (left+right): 5/20 words, excluding symbols Basis: association tokens Distinction with respect to window frequency window 1 2 3 5 10 20 50 5 66 56 50 42 33 23 14 20 77 70 66 59 50 40 27 Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 21

Window Co-Occurrence Verb-Noun Corpus data: 200 million word newspaper text Window (left+right): 5/20 words, excluding symbols Basis: association tokens Distinction with respect to window frequency window 1 2 3 5 10 20 50 5 66 56 50 43 34 24 14 20 76 69 66 59 50 40 27 Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 22

Window Co-Occurrence Verb-Adverb Corpus data: 200 million word newspaper text Window (left+right): 5/20 words, excluding symbols Basis: association tokens Distinction with respect to window frequency window 1 2 3 5 10 20 50 5 84 78 73 67 55 43 30 20 91 88 85 80 72 62 50 Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 23

Association Analysis: Summary Morpho-syntactic distribution: nouns dominate Nouns represent (prominent) argument roles of verbs Scene information in addition to subcategorisation; co-occurrence counts to supplement argument counts Strong co-occurrence of verbs and adverb responses Results depend on verb frequency and semantic class Usage of roles and window-based nouns for distributional verb descriptions Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 24

Association-based Verb Classes: Creation and Validation

Association Overlap klagen / jammern moan Frauen women Leid suffering Schmerz pain Trauer mourning bedauern regret beklagen bemoan heulen cry nervig annoying nölen moan traurig sad weinen cry 2 / 3 6 / 3 3 / 7 6 / 2 2 / 2 4 / 3 2 / 3 2 / 2 2 / 3 2 / 5 13 / 9 overlap: 35 types Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 26

Association-based Clustering Agglomerative (bottom-up) hierarchical clustering Similarity measure: skew divergence Merging criterion: Ward s method (sum-of-squares) Hierarchy cut: 100 classes Cluster analysis informs about» classes» verbs» class features, i.e. associations Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 27

Association-based Example Classes Class bedauern `regret, heulen `cry, jammern `moan, klagen `complain, moan, sue, verzweifeln `become desperate, weinen `cry abnehmen `lose weight, abspecken `lose weight, zunehmen `gain weight Features Trauer `mourning, weinen `cry, traurig `sad, Tränen `tears, jammern `moan, Angst `fear, Mitleid `pity, Schmerz `pain, etc. Diät `diet, Gewicht `weight, dick `fat, abnehmen `lose weight, Waage `scale, Essen `food, essen `eat, Sport `sports, dünn `thin, Fett `fat, etc. Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 28

Validation Claim: A clustering based on verb associations and a standard setup compares well with existing semantic classes. Lexical semantic resources:» GermaNet (Kunze, 2000)» Salsa / FrameNet (Erk et al., 2003) Extraction of sub-classifications of resources:» GermaNet 33 classes with 56 verbs (71 senses)» FrameNet 49 classes with 104 verbs (220 senses) Hierarchical clustering of verb subsets; pair-wise evaluation (Hatzivassiloglou/McKeown, 1993): v1, v2 cluster v1, v2 gold standard?» GermaNet 62.69% (upper bound: 82.35%)» FrameNet 34.68% (upper bound: 49.90%) Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 29

Association-based Classes: Summary Considerable overlap between association-based classes and the lexical resources GermaNet and FrameNet Differences in validation for GermaNet vs. FrameNet:» types of semantic similarity» degrees of ambiguity» clustering parameters: number of verbs, etc. Potential use of association-based classes as gold standard for clustering experiments Associations provide guidance to feature selection Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 30

Exploring Semantic Class Features

Exploring Semantic Class Features Grammar-based relations from statistical grammar: verb-noun pairs with nominal heads of NPs and PPs, verb-adverb pairs from adverbial modifiers Co-occurrence window: 200-million word newspaper corpus, 20-word window (left and right) Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 32

Exploring Semantic Class Features grammar relations features n na na NP PP NP&PP ADV 12,635 14,458 13,416 20,792 14,513 22,366 10,080 cov. (%) 3.82 4.32 6.93 12.23 5.36 14.08 3.63 co-occurrence: window-20 features all cut ADJ ADV N V 934,783 100,305 96,178 5,688 660,403 34,095 cov. (%) 66.15 57.79 9.13 1.72 39.27 15.51 Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 33

Corpus-based Clustering a f Experiment verbs: agglomerative hierarchical clustering, evaluation against assoc-classes: accuracy GermaNet: random selection of 100 synsets, random hard version with 233 verbs, clustering and evaluation as above FrameNet: pre-release version from May 2005, random hard version with 406 verbs in 77 classes, clustering and evaluation as above a b e e o b k m GS Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 34

Corpus-based Clustering: Results frames grammar relations f-pp f-pref n na na NP PP NP&PP ADV Assoc 37.50 37.80 35.90 37.18 39.25 39.14 37.97 41.28 38.53 GN 46.98 49.14 58.01 53.37 51.90 53.10 54.21 51.77 51.82 FN 33.50 32.76 29.46 30.13 32.74 34.16 28.72 33.91 35.24 co-occurrence: window-20 all cut ADJ ADV N V Assoc 39.33 39.45 37.31 36.89 39.33 38.84 GN 51.53 52.42 50.88 47.79 52.86 49.12 FN 32.01 32.84 31.08 31.00 34.24 31.75 Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 35

Corpus-based Clustering: Results frames grammar relations f-pp f-pref n na na NP PP NP&PP ADV Assoc 37.50 37.80 35.90 37.18 39.25 39.14 37.97 41.28 38.53 GN 46.98 49.14 58.01 53.37 51.90 53.10 54.21 51.77 51.82 FN 33.50 32.76 29.46 30.13 32.74 34.16 28.72 33.91 35.24 co-occurrence: window-20 all cut ADJ ADV N V Assoc 39.33 39.45 37.31 36.89 39.33 38.84 GN 51.53 52.42 50.88 47.79 52.86 49.12 FN 32.01 32.84 31.08 31.00 34.24 31.75 Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 36

Corpus-based Clustering: Results frames grammar relations f-pp f-pref n na na NP PP NP&PP ADV Assoc 37.50 37.80 35.90 37.18 39.25 39.14 37.97 41.28 38.53 GN 46.98 49.14 58.01 53.37 51.90 53.10 54.21 51.77 51.82 FN 33.50 32.76 29.46 30.13 32.74 34.16 28.72 33.91 35.24 co-occurrence: window-20 all cut ADJ ADV N V Assoc 39.33 39.45 37.31 36.89 39.33 38.84 GN 51.53 52.42 50.88 47.79 52.86 49.12 FN 32.01 32.84 31.08 31.00 34.24 31.75 Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 37

Corpus-based Clustering: Results no correlation! frames grammar relations f-pp f-pref n na na NP PP NP&PP ADV Assoc 37.50 37.80 35.90 37.18 39.25 39.14 37.97 41.28 38.53 GN 46.98 49.14 58.01 53.37 51.90 53.10 54.21 51.77 51.82 FN 33.50 32.76 29.46 30.13 32.74 34.16 28.72 33.91 35.24 co-occurrence: window-20 all cut ADJ ADV N V Assoc 39.33 39.45 37.31 36.89 39.33 38.84 GN 51.53 52.42 50.88 47.79 52.86 49.12 FN 32.01 32.84 31.08 31.00 34.24 31.75 Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 38

Corpus-based Clustering: Results no significant difference! frames grammar relations f-pp f-pref n na na NP PP NP&PP ADV Assoc 37.50 37.80 35.90 37.18 39.25 39.14 37.97 41.28 38.53 GN 46.98 49.14 58.01 53.37 51.90 53.10 54.21 51.77 51.82 FN 33.50 32.76 29.46 30.13 32.74 34.16 28.72 33.91 35.24 co-occurrence: window-20 all cut ADJ ADV N V Assoc 39.33 39.45 37.31 36.89 39.33 38.84 GN 51.53 52.42 50.88 47.79 52.86 49.12 FN 32.01 32.84 31.08 31.00 34.24 31.75 Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 39

Corpus-based Clustering: Results frames grammar relations f-pp f-pref n na na NP PP NP&PP ADV Assoc 37.50 37.80 35.90 37.18 39.25 39.14 37.97 41.28 38.53 GN 46.98 49.14 58.01 53.37 51.90 53.10 54.21 51.77 51.82 FN 33.50 32.76 29.46 30.13 32.74 34.16 28.72 33.91 35.24 co-occurrence: window-20 all cut ADJ ADV N V Assoc 39.33 39.45 37.31 36.89 39.33 38.84 GN 51.53 52.42 50.88 47.79 52.86 49.12 FN 32.01 32.84 31.08 31.00 34.24 31.75 Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 40

Corpus-based Clustering: Results frames grammar relations f-pp f-pref n na na NP PP NP&PP ADV Assoc 37.50 37.80 35.90 37.18 39.25 39.14 37.97 41.28 38.53 GN 46.98 49.14 58.01 53.37 51.90 53.10 54.21 51.77 51.82 FN 33.50 32.76 29.46 30.13 32.74 34.16 28.72 33.91 35.24 co-occurrence: window-20 all cut ADJ ADV N V Assoc 39.33 39.45 37.31 36.89 39.33 38.84 GN 51.53 52.42 50.88 47.79 52.86 49.12 FN 32.01 32.84 31.08 31.00 34.24 31.75 Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 41

Corpus-based Clustering: Results frames grammar relations f-pp f-pref n na na NP PP NP&PP ADV Assoc 37.50 37.80 35.90 37.18 39.25 39.14 37.97 41.28 38.53 GN 46.98 49.14 58.01 53.37 51.90 53.10 54.21 51.77 51.82 FN 33.50 32.76 29.46 30.13 32.74 34.16 28.72 33.91 35.24 co-occurrence: window-20 all cut ADJ ADV N V Assoc 39.33 39.45 37.31 36.89 39.33 38.84 GN 51.53 52.42 50.88 47.79 52.86 49.12 FN 32.01 32.84 31.08 31.00 34.24 31.75 Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 42

Corpus-based Clustering: Results frames grammar relations f-pp f-pref n na na NP PP NP&PP ADV Assoc 37.50 37.80 35.90 37.18 39.25 39.14 37.97 41.28 38.53 GN 46.98 49.14 58.01 53.37 51.90 53.10 54.21 51.77 51.82 FN 33.50 32.76 29.46 30.13 32.74 34.16 28.72 33.91 35.24 co-occurrence: window-20 all cut ADJ ADV N V significant difference! Assoc GN FN 39.33 51.53 32.01 39.45 52.42 32.84 37.31 50.88 31.08 36.89 47.79 31.00 39.33 52.86 34.24 38.84 49.12 31.75 Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 43

Properties of Gold Standard Verb Classes verbs average verb freq no. of verbs with freq < 50/20/10 Assoc 330 2,465 41 16 8 GN 233 1,040 98 65 40 FN 406 1,876 54 16 11 Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 44

Summary of Results No correlation between overlap of associations / feature types and respective clustering results (Pearson s correlation, p>.1) Window-based features are not significantly worse than selected grammar-based functions; applying cut-offs has almost no impact Several cases of grammar-based and window-based features outperform frame-based features (i.e., previous work) Adverbs outperform frame-based features, even some nominals Most successful feature types vary for gold standards Significantly better results for GermaNet clusterings than for experiment-based and FrameNet clusterings Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 45

Outlook Which feature types are appropriate to model human associations? Which types of (semantic) verb classifications rely on which types of features? Which classification parameters (e.g., size of classes, ambiguity of verbs, empirical properties of verbs) influence the clustering result? How do the features and parameters differ with respect to a specific semantic verb class? Sabine Schulte im Walde / SfS Tübingen, Nov. 2006 46