The Proposition Bank

Similar documents
Prediction of Maximal Projection for Semantic Role Labeling

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

SEMAFOR: Frame Argument Resolution with Log-Linear Models

LTAG-spinal and the Treebank

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Ensemble Technique Utilization for Indonesian Dependency Parser

Unsupervised Learning of Narrative Schemas and their Participants

Developing a large semantically annotated corpus

The Discourse Anaphoric Properties of Connectives

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Grammar Extraction from Treebanks for Hindi and Telugu

Building a Semantic Role Labelling System for Vietnamese

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Constraining X-Bar: Theta Theory

Developing a TT-MCTAG for German with an RCG-based Parser

Learning Computational Grammars

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

An Interactive Intelligent Language Tutor Over The Internet

The Smart/Empire TIPSTER IR System

Proceedings of the 19th COLING, , 2002.

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Parsing of part-of-speech tagged Assamese Texts

The Role of the Head in the Interpretation of English Deverbal Compounds

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Some Principles of Automated Natural Language Information Extraction

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Accurate Unlexicalized Parsing for Modern Hebrew

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Beyond the Pipeline: Discrete Optimization in NLP

The stages of event extraction

Linking Task: Identifying authors and book titles in verbose queries

Grammars & Parsing, Part 1:

Annotation Projection for Discourse Connectives

A Computational Evaluation of Case-Assignment Algorithms

Natural Language Processing. George Konidaris

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Multi-Lingual Text Leveling

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Graph Alignment for Semi-Supervised Semantic Role Labeling

Update on Soar-based language processing

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

The Choice of Features for Classification of Verbs in Biomedical Texts

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Pseudo-Passives as Adjectival Passives

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

AQUA: An Ontology-Driven Question Answering System

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Adapting Stochastic Output for Rule-Based Semantics

A Graph Based Authorship Identification Approach

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Context Free Grammars. Many slides from Michael Collins

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Multilingual Sentiment and Subjectivity Analysis

Compositional Semantics

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

TINE: A Metric to Assess MT Adequacy

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

CS 598 Natural Language Processing

The Role of Semantic and Discourse Information in Learning the Structure of Surgical Procedures

Building an HPSG-based Indonesian Resource Grammar (INDRA)

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Using dialogue context to improve parsing performance in dialogue systems

THE VERB ARGUMENT BROWSER

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

Online Updating of Word Representations for Part-of-Speech Tagging

Training and evaluation of POS taggers on the French MULTITAG corpus

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Analysis of Probabilistic Parsing in NLP

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Construction Grammar. University of Jena.

Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

The Ups and Downs of Preposition Error Detection in ESL Writing

A Bayesian Learning Approach to Concept-Based Document Classification

A Case Study: News Classification Based on Term Frequency

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Rule Learning With Negation: Issues Regarding Effectiveness

Theoretical Syntax Winter Answers to practice problems

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

Argument structure and theta roles

A deep architecture for non-projective dependency parsing

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Applications of memory-based natural language processing

Transcription:

The Proposition Bank An Annotated Corpus of Semantic Roles TzuYi Kuo EMLCT Saarland University June 14, 2010 1

Outline Introduction Motivation PropBank Semantic role Framing Annotation Automatic Semantic-Role labeling Features Algorithm Evaluation Conclusion 2

Outline Introduction Motivation PropBank Semantic Role Framing Annotation Automatic Semantic-Role labeling Features Algorithm Evaluation Conclusion 3

Introduction Represent the full meaning of sentences Alternation Syntactic realization of semantic arguments 4

Introduction Represent the full meaning of sentences Alternation Syntactic realization of semantic arguments same underlying semantic role 5

Introduction Proposition Bank Predicate-argument information Penn Treebank 6

Introduction Focus on Argument structure of verbs Provide a complete corpus annotated with semantic roles Goal Provide a broad-coverage hand-annotated corpus for supervised automatic role labelers Show how and why these syntactic alternations take place 7

Outline Introduction Motivation PropBank Semantic Role Framing Annotation Automatic Semantic-Role labeling Features Algorithm Evaluation Conclusion 8

Motivation Inspired by Levin (1993) Research into the linking between semantic roles and syntactic realization Syntactic frames are a direct reflection of the underlying semantics Define verb classes Based on the ability of particular verbs In syntactic frames 9

Motivation VerbNet (Kipper et al.,2000) Extend Levin s classes Adding an abstract representation of the syntactic frames for each class Correspond between syntactic positions and the semantic roles they express Ex. Break Agent REL Patient Patient REL into pieces 10

Outline Introduction Motivation PropBank Semantic Role Framing Annotation Automatic Semantic-Role labeling Features Algorithm Evaluation Conclusion 11

PropBank From sentences to propositions John met Mary. John and Mary met. John met with Mary. Proposition: meet(john, Mary) John and Mary had a meeting.... 12

Outline Introduction Motivation PropBank Semantic Role Framing Annotation Automatic Semantic-Role labeling Features Algorithm Evaluation Conclusion 13

Semantic Role Difficult to define a universal set of semantic roles covering all types of predicates Verb-by-verb basis Arg0 Agent Arg1 Prototypical Patient 14

Semantic Role Verb-specific numbered role 15

Semantic Role Verb-specific numbered role Acceptor Thing accepted Accepted-from 16

Semantic Role Verb Meaning1 Meaning2 Roles Syntactic Frames Examples Roleset Frameset Frames File Attempt to cover the range of syntactic alternations afforded by the usage 17

Outline Introduction PropBank Semantic Role Framing Annotation Automatic Semantic-Role labeling Features Algorithm Evaluation Conclusion 18

Framing Distinguishing Framesets Different numbers of arguments 19

Framing Distinguishing Framesets Verb-particle 20

Framing Distinguishing Framesets Different syntactic type NP Clause object 21

Framing Secondary Predications 22

Framing Traces Empty category which known as trace Coindex with other constituents in tree 23

Frames file Framing the collection of framesets for each lexeme group into Major sense2 Major sense1 Frameset2 Frameset1 24

Framing In Wall Street Journal Over 3,300 verbs framed 4,500 framesets described Average polysemy of 1.36 Each instance of a polysemous verb is marked as to which frameset it belongs to Interannotator (ITA) agreement of 94% 25

Outline Introduction Motivation PropBank Framing Annotation Automatic Semantic-Role labeling Features Algorithm Evaluation Conclusion 26

Development Process Annotation Rule-based argument tagger (Palmer, Rosenzweig, and Cotton 2001) Class-based mappings between grammatical and semantic roles 83% accuracy The output is then corrected by hand Examining the descriptions of the arguments and the example tagged sentences 27

Development Process Annotation Kappa statistic (Siegel and Castellan, 1988) Measure agreement between annotators P(A) : the probability of inter-annotator agreement P(E) : the agreement expected by chance 28

Development Process Annotation Kappa statistic 29

Outline Introduction PropBank Framing Annotation Automatic Semantic-Role labeling Features Algorithm Evaluation Conclusion 30

Automatic Semantic Role Labeling Examine the importance of syntactic information for semantic-role labeling Comparing the performance of System based on gold-standard parses Automatically generated parser output 31

Automatic Semantic Role Labeling Gildea and Jurafsky (2002) Statistical system trained on FrameNet project Pass sentences through an automatic parser (Collins, 1999) Extract syntactic features from the parses Estimate probabilities for semantic roles from the syntactic and lexical features Errors introduced by the parser no doubt negatively affected the results obtained 32

Outline Introduction PropBank Framing Annotation Automatic Semantic-Role labeling Features Algorithm Evaluation Conclusion 33

Automatic Semantic Role Labeling Features Phrase type : the syntactic type of the phrase expressing the semantic roles Parse tree path : the path from the predicate through the parse tree to the constituent in question. In order to capture the syntactic relation of a constituent to the predicate 34

Automatic Semantic Role Labeling Features Position : indicates whether the constituent to be labeled occurs before or after the predicate Voice : distinguishes between active and passive, direct objects of active verbs correspond to subjects of passive verbs Headword : a lexical feature and provides information about the semantic type of the role filler 35

Outline Introduction PropBank Framing Annotation Automatic Semantic-Role labeling Features Algorithm Evaluation Conclusion 36

Automatic Semantic Role Labeling Predict argument roles r i : role of constituents i in the sentence F i = {pt i, path i, pos i, v i, h i } : set of features at each constituent in the parse tree 37

Automatic Semantic Role Labeling Predict argument roles : a constituent s role given our five features for the constituent and the predicate p : a set of roles appearing in a sentence given a predicate 38

Outline Introduction PropBank Framing Annotation Automatic Semantic-Role labeling Features Algorithm Evaluation Conclusion 39

Automatic Semantic Role Labeling Data PropBank (preliminary release version) 72,109 predicate-argument structures 190,815 individual arguments examples from 2,462 lexical predicates (types) Testing data : Penn Treebank Section 23 40

Automatic Semantic Role Labeling Results Given the constituents which are arguments to the predicate and merely has to predict the correct role Find the arguments in the sentence and label them correctly Accuracy of semantic-role prediction (in percentages) for known boundaries 41

Automatic Semantic Role Labeling Results Adding Traces Provide hints as to the semantics of individual clauses Accuracy of semantic-role prediction (in percentages) for unknown boundaries (the system must identify the correct constituents as arguments and give them the correct roles) 42

Automatic Semantic Role Labeling Results Labeled recall : how often the semantic-role label is correctly identified Unlabeled recall : how often a constituent with the given role is correctly identified as being a semantic role, even if it is labeled with the wrong role 43

Automatic Semantic Role Labeling The relation of Syntactic Parsing and Semantic-Role labeling Chunks Do not build a full parse tree Large advantage in speed Contain basic-level constituent boundaries and labels No dependencies between constituents 44

Automatic Semantic Role Labeling The relation of Syntactic Parsing and Semantic-Role labeling 45

Conclusion Consistent annotation has been achieved One step closer to a detailed semantic representation WSJ too domain specific, too financial, need broader coverage genres for more general annotation 46

Future work Add more informative thematic labels based on VerbNet Map annotation with FrameNet to merge two annotated data sets Explore machine-learning approaches Integration of semantic-role labeling and sense tagging with the parsing process 47

References Levin, B. (1993). English Verb Classes and Alternations: A preliminary Investigation. University of Chicago Press, Chicago. Kipper, K., Hoa T. D., and Martha, P. (2000). Class-based construction of a verb lexicon. Proceedings of the Seventh National Conference on Artificial Intelligence 48