Abstract Meaning Representations for Sembanking

Similar documents
Abstract Meaning Representation for Sembanking

Compositional Semantics

Proof Theory for Syntacticians

The stages of event extraction

CS 598 Natural Language Processing

Linking Task: Identifying authors and book titles in verbose queries

Using dialogue context to improve parsing performance in dialogue systems

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Chapter 4: Valence & Agreement CSLI Publications

Developing Grammar in Context

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Loughton School s curriculum evening. 28 th February 2017

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Context Free Grammars. Many slides from Michael Collins

Highlighting and Annotation Tips Foundation Lesson

Ensemble Technique Utilization for Indonesian Dependency Parser

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Prediction of Maximal Projection for Semantic Role Labeling

Annotation Projection for Discourse Connectives

AQUA: An Ontology-Driven Question Answering System

Ohio s Learning Standards-Clear Learning Targets

The College Board Redesigned SAT Grade 12

Aspectual Classes of Verb Phrases

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt

Multilingual Sentiment and Subjectivity Analysis

Coast Academies Writing Framework Step 4. 1 of 7

A Case Study: News Classification Based on Term Frequency

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

The Smart/Empire TIPSTER IR System

What the National Curriculum requires in reading at Y5 and Y6

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

AN EXPERIMENTAL APPROACH TO NEW AND OLD INFORMATION IN TURKISH LOCATIVES AND EXISTENTIALS

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Grammars & Parsing, Part 1:

Florida Reading Endorsement Alignment Matrix Competency 1

Derivational and Inflectional Morphemes in Pak-Pak Language

Myths, Legends, Fairytales and Novels (Writing a Letter)

Mathematics subject curriculum

Parsing of part-of-speech tagged Assamese Texts

Specifying Logic Programs in Controlled Natural Language

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

CSC200: Lecture 4. Allan Borodin

California Department of Education English Language Development Standards for Grade 8

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Beyond the Pipeline: Discrete Optimization in NLP

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Constructing Parallel Corpus from Movie Subtitles

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Using a Native Language Reference Grammar as a Language Learning Tool

Using Semantic Relations to Refine Coreference Decisions

A First-Pass Approach for Evaluating Machine Translation Systems

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

Language Acquisition Chart

The Discourse Anaphoric Properties of Connectives

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Words come in categories

National Literacy and Numeracy Framework for years 3/4

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

Formulaic Language and Fluency: ESL Teaching Applications

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Foundations of Knowledge Representation in Cyc

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Instructional Supports for Common Core and Beyond: FORMATIVE ASSESMENT

An Interactive Intelligent Language Tutor Over The Internet

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Advanced Grammar in Use

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Software Maintenance

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Extending Place Value with Whole Numbers to 1,000,000

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

The Strong Minimalist Thesis and Bounded Optimality

Type Theory and Universal Grammar

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Vocabulary Usage and Intelligibility in Learner Language

arxiv: v1 [cs.cl] 2 Apr 2017

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

Common Core Exemplar for English Language Arts and Social Studies: GRADE 1

Age Effects on Syntactic Control in. Second Language Learning

The Verbmobil Semantic Database. Humboldt{Univ. zu Berlin. Computerlinguistik. Abstract

A Computational Evaluation of Case-Assignment Algorithms

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

Transcription:

Abstract Meaning Representations for Sembanking University of Edinburgh March 4, 2016

Overview 1 Introduction What is AMR and why might it be useful? 2 Main matter Design of AMR Contents of AMR 3 Nearly the end A few more things about AMR

What is AMR and why might it be useful? What is AMR (briefly)? Abstract Meaning Representation (AMR) is a semantic representation language aiming to express the meanings of whole English sentences in a human and machine-readable way.

What is AMR and why might it be useful? Why was it created? AMR was created in response to the fragmented state of the semantic annotation field: many separate annotations exist for niche tasks, for example co-reference, Named Entities, discourse connectives, etc. This results in resources and efforts being split over many different projects, which is an issue in particular with regard to training data.

What is AMR and why might it be useful? Why was it created? - continued The goal of the authors is to establish a simple readable sembank of English sentences paired with their whole-sentence, logical meanings using AMR. They believe such a sembank could have a similar impact on statistical Natural Language Understanding (NLU) and Generation (NLG) as the Penn TreeBank had on statistical parsing.

Design of AMR Basic principles Abstract Meaning Representation relies on these basic principles, meant to ensure its suitability for sembanking: Easy to work with for both humans and computers Several syntactic forms, but one meaning? Still one AMR PropBank frames as a basis for the representation From strings to meanings or meanings to strings Not an Interlingua: AMR is language-specific

Design of AMR Easy to work with for both humans and computers AMRs are represented as rooted, directional and labeled graphs that are easy for humans to read, and easy for programs to traverse. There are several different formats to work with: LOGIC format, AMR format, GRAPH format

Design of AMR Figure: LOGIC format Figure: AMR format Figure: GRAPH format

Design of AMR One meaning, one AMR AMR attempts to abstract away from syntactic representations: sentences with the same basic meaning should be assigned the same representation, regardless of syntax variations. Example (One AMR, several syntaxes) (d / describe-01 :arg0 (m / man) :arg1 (m2 / mission) :arg2 (d / disaster)) The man described the mission as a disaster. The man s description of the mission: disaster. As the man described it, the mission was a disaster.

Design of AMR PropBank framesets as a basis for AMR PropBank is a corpus annotated with verbal propositions and their arguments. Each of these verbal proposition along with its argument is called a frameset. ANR makes use of these frameset to annotate meanings of sentences. However, contrary to the PropBank corpus, even phrases containing no verbs are annotated using PropBank framesets. Example (Related verbs and nouns go to one frameset) bond investor to invest invest-01

Design of AMR From strings to meanings, or meanings to strings AMR does not state any rules on how to derive meanings from sentences or sentences from meanings in order to make sembanking faster, as annotators can simply write down the meanings associated to sentences without explaining the steps used to get there. It also allows researchers to explore their own ideas about how strings are related to meanings.

Design of AMR Not an Interlingua: AMR is language-specific AMR uses concepts (English words, PropBank framesets, or special keywords) inherited from English, and is therefore heavily biased towards it. It is not meant to be able to represent the meaning of sentences in other languages. However, there have been work on developing AMR for other languages, and to use AMRs as an additional transfer layer in Machine Translation 1 1 For more information, see Xue et al, 2014

Design of AMR More about AMR graphs We can distinguish two main elements in AMR: concepts and semantic relations between those concepts. AMR uses variables to refer to instances of a certain concept. Leaves of the graphs are labelled with concepts, so one labelled leave is an instance of a given concept (e.g. boy ) AMR uses approximately a hundred relations: frame arguments (:arg0, :arg1, etc), general semantic relations (:age, :purpose, etc), relations for quantities, for date-entities, and for lists. Relations are labelled on the edges of the graph.

Design of AMR More about AMR graphs - continued Figure: The boy wants to go. Concepts instantiated in the graph: boy want-01 go-01 Relations present in the graph: :arg0 :arg1

Design of AMR (w / want-01 :arg0 (b / boy) :arg1 (g / go-01 :arg0 b)) The boy wants to go. w, b and g are variables, that is, instances of the concepts want-01, boy and go-01. This is denoted by the symbol /. Each concept and its arguments is enclosed by parenthesis. :arg0 and :arg1 are semantic relations, denoted by the symbol :. b is the 1st argument of both w and g, and g is the 2nd argument of w.

Contents of AMR Examples of AMR representations - General semantic relations (s / hum-02 :arg0 (s2 / soldier) :beneficiary (g / girl) :time (w / walk-01 :arg0 g :destination (t / town))) The soldier hummed to the girl as she walked to town.

Contents of AMR Examples of AMR representations - Inverse relations (s / sing-01 :arg0 (b / boy :source (c / college))) The boy from the college sang. (b / boy :arg0-of (s / sing-01) :source (c / college)) The college boy who sang. The top-level root of an AMR represents the focus of the sentence. With the inverse relations :arg0-of and :quant-of, it becomes possible to build rooted structures, changing the focus of the representation as needed.

Contents of AMR Examples of AMR representations - Modals and negation (g / go-01 :arg0 (b / boy) :polarity -) The boy did not go. (p / possible :domain (g / go-01 :arg0 (b / boy)) :polarity -) It s possible for the boy not to go. Negation is expressed with :polarity (note the -), and modals are expressed with concepts, such as possible or obligate-01. We can also see that copulas are expressed by :domain ( It is not possible... )

Contents of AMR Examples of AMR representations - Questions (f / find-01 :arg0 (g / girl) :arg1 (a / amr-unknown)) What did the girl find? The concept amr-unknown is used for wh-questions, yes/no questions and imperatives are handled differently with the relation :mode.

Contents of AMR Quick question (o / obligate-01 :arg2 (g / go-01 :arg0 (b / boy)) :polarity -)

Contents of AMR Quick question (o / obligate-01 :arg2 (g / go-01 :arg0 (b / boy)) :polarity -) The boy doesn t have to go.

Contents of AMR Quick question (o / obligate-01 :arg2 (g / go-01 :arg0 (b / boy)) :polarity -) (f / find-01 :arg0 (g / girl) :arg1 (t / toy :poss (a / amr-unknown))) The boy doesn t have to go.

Contents of AMR Quick question (o / obligate-01 :arg2 (g / go-01 :arg0 (b / boy)) :polarity -) The boy doesn t have to go. (f / find-01 :arg0 (g / girl) :arg1 (t / toy :poss (a / amr-unknown))) Whose toy did the girl find?

Contents of AMR Examples of AMR representations - Verbs and nouns (s / see-01 :arg0 (j / judge) :arg1 (e / explode-01)) The judge saw the explosion. (t / thing :arg1-of (o / opine-01 :arg0 (g / girl))) the girl s opinion Most English verbs have a corresponding PropBank frameset, and it is also possible to express most nouns using framesets.

Contents of AMR Examples of AMR representations - Named Entities (p / person :name (n / name :op1 "Mollie" :op2 "Brown")) Mollie Brown Any name can be handled with :name. Additionally, there are approximately 80 standardized types of named entities.

Contents of AMR Examples of AMR representations - Reification (m / marble :location (j / jar)) the marble in the jar (b / be-located-at-91 :arg1 (m / marble) :arg2 (j / jar)) The marble is in the jar. Reification allows us to use an AMR relation as a concept.

A few more things about AMR Limitations of AMR AMR has a few issues (it does).

A few more things about AMR Limitations of AMR AMR has a few issues (it does). No inflectional morphology for tense and numbers, no articles.

A few more things about AMR Limitations of AMR AMR has a few issues (it does). No inflectional morphology for tense and numbers, no articles. No universal quantifiers.

A few more things about AMR Limitations of AMR AMR has a few issues (it does). No inflectional morphology for tense and numbers, no articles. No universal quantifiers. No distinction between real, hypothetical, future or imagined events.

A few more things about AMR Limitations of AMR AMR has a few issues (it does). No inflectional morphology for tense and numbers, no articles. No universal quantifiers. No distinction between real, hypothetical, future or imagined events. There are issues with the representation of some concepts, e.g. history teacher vs history professor.

A few more things about AMR Evaluation and inter-annotator agreement The authors created a metric to evaluate inter-annotator agreement called smatch, that reports the semantic overlap between two AMRs by viewing each AMR as a conjunction of logical triples and calculates precision, recall and F-score. In a inter-annotator agreement study, four experts annotated 100 newswire sentences and 80 web text sentences and then created consensus annotations through discussion. The average annotator vs consensus smatch score was 0.83 for newswire and 0.79 for web text. Average inter-annotator agreement score amongst newly trained annotators is 0.71.

A few more things about AMR Current AMR Bank The AMR bank is composed of several thousand sentences and their annotations. Sources include the novel The Little Prince, news programs, CCCTV broadcast conversations. It takes 7-10 minutes to annotate a full sentence, 1-3 minutes to post-edit it.

A few more things about AMR Applications, extensions to AMR The authors main goal is to constitute a large sembank for statistical NLU and MT applications. A disjunctive AMR has recently been created in order to allow annotators to express the same content in different ways: official talks vs state-sanctioned talks vs meetings sanctioned by the state. They also wish to include more relations, quantification, temporal relations, etc.

A few more things about AMR References Banarescu et al. (2014) Abstract Meaning Representation (AMR) 1.2 Specification Proc. Linguistic Annotation Workshop Banarescu et al. (2013) Abstract Meaning Representation for Sembanking Xue et al. (2013) Not an Interlingua, But Close: Comparison of English AMRs to Chinese and Czech Proc. LREC Cai and Knight (2013) Smatch: an Evaluation Metric for Semantic Feature Structures Proc. ACL

A few more things about AMR Finally the end. Thanks for listening.