Vorlesung 10: Evaluation

Similar documents
Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Context Free Grammars. Many slides from Michael Collins

Construction Grammar. University of Jena.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

CS 598 Natural Language Processing

Specifying a shallow grammatical for parsing purposes

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Argument structure and theta roles

The Discourse Anaphoric Properties of Connectives

Using dialogue context to improve parsing performance in dialogue systems

Adjectives tell you more about a noun (for example: the red dress ).

THE VERB ARGUMENT BROWSER

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Prediction of Maximal Projection for Semantic Role Labeling

The Smart/Empire TIPSTER IR System

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Chapter 4: Valence & Agreement CSLI Publications

The Role of the Head in the Interpretation of English Deverbal Compounds

Natural Language Processing. George Konidaris

Some Principles of Automated Natural Language Information Extraction

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Constraining X-Bar: Theta Theory

Accurate Unlexicalized Parsing for Modern Hebrew

Developing a TT-MCTAG for German with an RCG-based Parser

Compositional Semantics

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

MODELING DEPENDENCY GRAMMAR WITH RESTRICTED CONSTRAINTS. Ingo Schröder Wolfgang Menzel Kilian Foth Michael Schulz * Résumé - Abstract

LTAG-spinal and the Treebank

Linking Task: Identifying authors and book titles in verbose queries

Grammars & Parsing, Part 1:

Som and Optimality Theory

Learning Computational Grammars

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

An Interactive Intelligent Language Tutor Over The Internet

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Introduction to Text Mining

Dependency Annotation of Coordination for Learner Language

The Ups and Downs of Preposition Error Detection in ESL Writing

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

The stages of event extraction

Ensemble Technique Utilization for Indonesian Dependency Parser

AQUA: An Ontology-Driven Question Answering System

Advanced Grammar in Use

cmp-lg/ Jul 1995

Update on Soar-based language processing

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Korean ECM Constructions and Cyclic Linearization

Today we examine the distribution of infinitival clauses, which can be

Loughton School s curriculum evening. 28 th February 2017

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

A First-Pass Approach for Evaluating Machine Translation Systems

Hindi-Urdu Phrase Structure Annotation

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

Underlying and Surface Grammatical Relations in Greek consider

Interactive Corpus Annotation of Anaphor Using NLP Algorithms

Parsing of part-of-speech tagged Assamese Texts

Annotation Projection for Discourse Connectives

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

Thornhill Primary School - Grammar coverage Year 1-6

Cross-lingual Transfer Parsing for Low-Resourced Languages: An Irish Case Study

Ch VI- SENTENCE PATTERNS.

Adapting Stochastic Output for Rule-Based Semantics

Intensive English Program Southwest College

The Choice of Features for Classification of Verbs in Biomedical Texts

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

Proof Theory for Syntacticians

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

The MEANING Multilingual Central Repository

Minimalism is the name of the predominant approach in generative linguistics today. It was first

A Grammar for Battle Management Language

Language skills to be used and worked upon : Listening / Speaking PPC-PPI / Reading / Writing

Corpus Linguistics (L615)

Treebank mining with GrETEL. Liesbeth Augustinus Frank Van Eynde

Graph Alignment for Semi-Supervised Semantic Role Labeling

Using Semantic Relations to Refine Coreference Decisions

Extracting Verb Expressions Implying Negative Opinions

An Evaluation of POS Taggers for the CHILDES Corpus

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge

Control and Boundedness

Words come in categories

Dependency, licensing and the nature of grammatical relations *

On Labeling: Principle C and Head Movement

The building blocks of HPSG grammars. Head-Driven Phrase Structure Grammar (HPSG) HPSG grammars from a linguistic perspective

Transcription:

Institut für Computerlinguistik, Uni Zürich: Effiziente Analyse unbeschränkter Texte Vorlesung 10: Evaluation Gerold Schneider Institute of Computational Linguistics, University of Zurich Department of Linguistics, University of Geneva gschneid@ifi.unizh.ch December 15, 2003 1

Contents 1. Traditional Syntactic Evaluation: Labeled Bracketting 2. Dependency-Based Evaluation: Lin 1995 3. An Annotation Scheme for Evaluation: Carroll et al. f.c. 4. First attempt: tgrep-based extratction 5. Second attempt: Mapping to Carroll et al. 6. Current Evaluation Results 7. Comparison to Related Work 8. Gradience. A Selection of Problematic Cases 2

of correct constituents in candidate # of all constituents in candidate # Effiziente Analyse unbeschränkter Texte: Evaluation 1 Traditional Syntactic Evaluation: Labeled Bracketting see Jurafsky & Martin 00: 464 PARSEVAL, Black et al. 1991 labeled recall: labeled precision: # of correct constituents in candidate # of correct constituents in gold standard cross-brackets: # of brackets overcrossing between candidate and gold standard 3

2 Dependency-Based Evaluation: Lin 1995 PARSEVAL may count a single error multiple times: a. [I [saw [[a man][with [[a dog] and [a cat]]]][in [the park]]]] (let this be the gold standard) b. [I [saw [[a man][with [[a dog] and [[a cat][in [the park]]]]]]]] 1 error: PP-attachment error to cat instead of saw, but 3 crossing brackets: 1. [a dog and a cat] vs. [a cat in the park] 2. [with a dog and a cat] vs. [a dog and a cat in the park] 3. [a man with a dog and a cat] vs. [with a dog and a cat in the park] recall: 6/10. precision: 7/11. c. [I [saw [a man] with [a dog] and [a cat][in [the park]]]] very shallow, insufficient analysis, but no crossing brackets. recall: 7/10. precision: 7/7. 4

Desiderata: Selective evaluation: depending on syntactic phenomena Ability to ignore inconsequential differences Facilitate the error diagnostics Evaluation based on grammatical relations instead of constituency! 5

3 An Annotation Scheme for Evaluation: Carroll et al. f.c. 3.1 More PARSEVAL problems Low agreement between parsing schemes for some constructions Partial PARSEVAL answer: remove certain bracketting info from consideration: negation, auxiliaries, punctuation, traces. Serious mapping problems to different annotation schemes remain The treebanks have been constructed with reference to sets of informal guidelines indicating the type of structres to be assigned. In the absence of a formal grammar controlling or verifying the manual annotations, the number of different structural configurations tends to grow without check. For example, the [Penn treebank] implicitly contains more than 10000 distinct context-free productions, the majority occurring only once. 6

Penalises parsers that return more information than contained in the Treebank Cannot be applied to dependency-based parsers For cascading systems, different levels cannot be distinguished (chunking vs. parsing in my case). 7

3.2 Carroll et al. annotation hierarchy (Carroll et al. f.c. 303, the subj or dobj relation is left out) ` ψ ψψ mod`` ncmod xmod cmod nominal dependent " " (( (((((((( cl-control cl-no control arg mod passive agnt h " hhhhhhhhh ncsubj ((((( ( subj ψ ψψ`` ` nominal xsubj csubj cl-control cl-no control arg hhhhh h dobj obj οο οx X X first obj2 comp (( ((h hh h iobj second prepositional clausal xcomp ccomp control cl: clausal mod: modification, adjunct, arg: argument, complement (no) control: He 1 wants [t 1 to leave] (control) vs. He says [that she left] (no control) nc actually means non-clausal, but that mostly amounts to nominal incl. prepositional!! a a no control 8

The GRs are encoded as Lisp/Prolog facts. 500 random sentences from the Susanne Corpus. Examples: ncmod(, flag, red). % a red flag ncmod(on, flag, roof). % flag on the roof xmod(without, eat, ask). % he ate the cake without asking cmod(because, eat, be). % he ate the cake because he was hungry arg mod(by, kill, Brutus ). % killed by Brutus ncsubj(she, eat, ). % she was eating xsubj(win, require, ). % to win the America s Cup requires heaps of cash csubj(leave, mean, ). % that Nellie left meant she was angry dobj(read, book, ). % read books dobj(mail, Mary, iobj). % mail Mary the contract (3rd arg is initial gr) iobj(in, arrive, Spain ). % arrive in Spain obj2(give,present, ). % give Mary a present xcomp(to, intend, leave). % Paul intends to leave xcomp(, be, easy). % Swimming is easy xcomp(in, be, Paris ). % Mary is in Paris ccomp(that, say, leave). % I said that he left 9

4 First evaluation: tgrep-extraction-based Using the grammatical relation (GR) data from the (held-out) section 00. Comparing the candidate parse GR and the tgrep d GR. While the theoretical idea is fine, practical mapping problems occur: the tgrep patterns have (almost?) 100 % precision, but below 100 % recall. (complexity problem) different grammatical assumptions (e.g. in favour of, some of the people) The results reported are thus about 5 % too low. 10

5 Second evaluation: Mapping to Carroll et al. Mapping to Carroll et al. is not always 1:1, but quite straightforward. Naive direct mapping (c-subscript for Carroll relations): subj, ncsubj c, obj, dobj c, pobj, iobj c, modpp, ncmod c etc. Works only partly: no adjunct/complement distinction for my PPs Tesnière translations different grammatical assumptions (e.g. Carroll does not consider relative pronouns to be subjects) The mapping thus becomes more involved (as follows). 11

Mapping for subject and objects 8 Precision: subj OR modpart! ncsubj C OR cmod C (with rel.pro) Subject >< Recall: ncsubj C! subj OR modpart ncsubj C = non-clausal subject >: 8 cmod C = clausal modification, used for relative clauses Precision: obj OR obj2! dobj C OR obj2 C Object >< Recall: dobj C OR obj2 C! obj OR obj2 dobj C =first object >: obj2 C =second object 12

>: Mapping for PP-attachment 8 Effiziente Analyse unbeschränkter Texte: Evaluation Precision: modpp! ncmod C(with prep) OR noun-pp >< Recall: prep) ncmodc(with OR prep) xmodc(with ncmod C =non-clausal modification xmodc (with prep)! modpp xmod C =clausal modification for verb-to-noun translations verb-pp >: 8 >< iobjc(with prep) OR Precision: pobj! arg modc OR ncmodc(with prep OR (prt & dobj)) OR xcompc(with prep) OR Recall: iobjc(with prep) OR xmodc(with prep) pobj! C =prepositional object, arg mod C =passive agent iobj arg modc xcomp C for PP-attachment to copular verbs 13

6 Current Evaluation Results Precision and recall measures subj prec 828 of 946 87.5 % subj recall 767 of 956 80.2 % obj prec 430 of 490 87.7 % obj recall 316 of 391 80.8 % nounpp prec 343 of 479 71.6 % verbpp prec 350 of 482 72.6 % ncmod recall 593 of 801 74.0 % iobj recall 132 of 157 84.0 % argmod recall 30 of 41 73.1 % Table 1: Pre-Current Evaluation of the Fully Lexicalized, Backed-Off System output 14

Current Evaluation and Comparison Percentage Values for Subject Object noun-pp verb-pp Precision 91 89 73 74 Recall 81 83 67 83 15

Current selective Long-Distance Dependency (LDD) evaluation (as far as the annotations permit) LDD relations results for modpart WH-Subject Precision 57/62 92 % WH-Subject Recall 45/50 90 % WH-Object Precision 6/10 60 % WH-Object Recall 6/7 86 % Anaphora of the rel. clause subject Precision 41/46 89 % Anaphora of the rel. clause subject Recall 40/63 63 % Passive subject Recall 132/160 83% Precision for subject-control subjects 40/50 80% Precision for object-control subjects 5/5 100% Precision of relation 34/46 74% Precision for topicalized verb-attached PPs 25/35 71% 16

7 Comparison to Related Work 7.1 Comparison to Lin Percentage Values for Subject Object noun-pp verb-pp Precision 91 89 73 74 Recall 81 83 67 83 Comparison to Lin (on the whole Susanne corpus) Subject Object PP-attachment Precision 89 88 78 Recall 78 72 72 17

7.2 Comparison to Buchholz; and to Charniak (according to Preiss) Percentage Values for Subject Object noun-pp verb-pp Precision 91 89 73 74 Recall 81 83 67 83 Comparison to Buchholz ; and to Charniak, according to Preiss Subject(ncsubj) Object(dobj) Precision 86; 82 88; 84 Recall 73; 70 77; 76 18

7.3 Comparison to Carroll s Parser only the numbers in bold can be compared Relation Precision Recall dependent 75 75 +mod 74 70 ++ncmod 78 73 ++xmod 70 52 ++cmod 67 48 +arg mod 84 41 +arg 77 84 ++subj 84 88 +++ncsubj 85 88 +++xsubj 100 40 +++csubj 14 100 ++comp 70 79 +++obj 68 79 ++++dobj 86 84 ++++obj2 39 84 ++++iobj 42 65 +++clausal 73 78 ++++xcomp 84 79 ++++ccomp 72 75 19

8 Gradience. A Selection of Problematic Cases Inter-annotator-agreement in the Carroll test corpus is around 95 %. Figure 1: Aberrant but intentional analysis csubj( be, rescue,, 92 ). % gold standard: no what = that, which analysis 20

... the measure would provide means of enforcing the law... ncsubj( enforce, measure,,21). How far can control reach?... there is nothing left of the conservative party... ncsubj( nothing, leave, obj, 71 ). % gold standard: there-movement modpart(nothing, leave,,!, 71). % parser analysis: reduced relative... prove [one of the difficult problems]... dobj( prove, one,, 48 ). % gold standard: syntactical analysis obj(prove, problem,,!, 48). % parser analysis: hyperclever semantic chunker, wrong head extraction? 21

PP-attachment:... brought enthusiastic responses from the audience... ncmod( from, response, audience, 11 ). % gold standard: to noun pobj(bring, audience, from, (! ), 11). % parser: to verb PP-attachment:... (the government) made blunders in Cuba ncmod( in, blunder, cuba, 51 ). % gold standard: to noun pobj(make, cuba, in,!, 51). % parser: to verb see discussion about PP-attachment in Lecture 3. 22