Graphical Annotation for Syntax-Semantics Mapping

Similar documents
AQUA: An Ontology-Driven Question Answering System

Annotation Projection for Discourse Connectives

University of Edinburgh. University of Pennsylvania

Proof Theory for Syntacticians

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

Developing a large semantically annotated corpus

National Literacy and Numeracy Framework for years 3/4

Achievement Level Descriptors for American Literature and Composition

Visual CP Representation of Knowledge

Compositional Semantics

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Ontologies vs. classification systems

Some Principles of Automated Natural Language Information Extraction

The Discourse Anaphoric Properties of Connectives

The College Board Redesigned SAT Grade 12

LTAG-spinal and the Treebank

Common Core State Standards for English Language Arts

Control and Boundedness

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Arizona s English Language Arts Standards th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS

Parsing of part-of-speech tagged Assamese Texts

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Annotating (Anaphoric) Ambiguity 1 INTRODUCTION. Paper presentend at Corpus Linguistics 2005, University of Birmingham, England

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Annotation Guidelines for Rhetorical Structure

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

A Framework for Customizable Generation of Hypertext Presentations

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

5. UPPER INTERMEDIATE

Applications of memory-based natural language processing

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

Prediction of Maximal Projection for Semantic Role Labeling

California Department of Education English Language Development Standards for Grade 8

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Context Free Grammars. Many slides from Michael Collins

Grade 4. Common Core Adoption Process. (Unpacked Standards)

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

A Grammar for Battle Management Language

CS 598 Natural Language Processing

Let's Learn English Lesson Plan

MYP Language A Course Outline Year 3

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown

Ensemble Technique Utilization for Indonesian Dependency Parser

Shared Mental Models

The Smart/Empire TIPSTER IR System

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Livermore Valley Joint Unified School District. B or better in Algebra I, or consent of instructor

Construction Grammar. University of Jena.

Writing a composition

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Teachers: Use this checklist periodically to keep track of the progress indicators that your learners have displayed.

TABE 9&10. Revised 8/2013- with reference to College and Career Readiness Standards

arxiv: v1 [cs.cl] 2 Apr 2017

Highlighting and Annotation Tips Foundation Lesson

Constraining X-Bar: Theta Theory

5 th Grade Language Arts Curriculum Map

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Realization of Textual Cohesion and Coherence in Business Letters through Presupposition 1

A Computational Evaluation of Case-Assignment Algorithms

Statewide Framework Document for:

An Introduction to the Minimalist Program

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

A Comparative Study of Research Article Discussion Sections of Local and International Applied Linguistic Journals

Beyond the Pipeline: Discrete Optimization in NLP

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

The MEANING Multilingual Central Repository

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

What the National Curriculum requires in reading at Y5 and Y6

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Yoshida Honmachi, Sakyo-ku, Kyoto, Japan 1 Although the label set contains verb phrases, they

PRAAT ON THE WEB AN UPGRADE OF PRAAT FOR SEMI-AUTOMATIC SPEECH ANNOTATION

Modeling user preferences and norms in context-aware systems

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

Grammars & Parsing, Part 1:

PROCESS USE CASES: USE CASES IDENTIFICATION

Facing our Fears: Reading and Writing about Characters in Literary Text

1. Introduction. 2. The OMBI database editor

The Structure of Relative Clauses in Maay Maay By Elly Zimmer

An Empirical and Computational Test of Linguistic Relativity

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Grammar Extraction from Treebanks for Hindi and Telugu

An Interactive Intelligent Language Tutor Over The Internet

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Nancy Hennessy M.Ed. 1

Accurate Unlexicalized Parsing for Modern Hebrew

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -

Natural Language Processing. George Konidaris

Guidelines for Writing an Internship Report

Citation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.

Transcription:

Graphical Annotation for Syntax-Semantics Mapping Kôiti Hasida Social ICT Research Center, Graduate School of Information Science and Technology, The University of Tokyo 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-8656, Japan. E-mail: hasida.koiti@i.u-tokyo.ac.jp Abstract A potential work item (PWI) for ISO standard (MAP) about linguistic annotation concerning syntax-semantics mapping is discussed. MAP is a framework for graphical linguistic annotation to specify a mapping (set of combinations) between possible syntactic and semantic structures of the annotated linguistic data. Just like a UML diagram, a MAP diagram is formal, in the sense that it accurately specifies such a mapping. MAP provides a diagrammatic sort of concrete syntax for linguistic annotation far easier to understand than textual concrete syntax such as in XML, so that it could better facilitate collaborations among people involved in research, standardization, and practical use of linguistic data. MAP deals with syntactic structures including dependencies, coordinations, ellipses, transsentential constructions, and so on. Semantic structures treated by MAP are argument structures, scopes, coreferences, anaphora, discourse relations, dialogue acts, and so forth. In order to simplify explicit annotations, MAP allows partial descriptions, and assumes a few general rules on correspondence between syntactic and semantic compositions. Keywords: diagrammatic annotation, syntax-semantics mapping, standardization 1. Introduction A potential work item (PWI) for ISO standard (let us call it MAP for convenience in the rest of the paper) of linguistic annotation concerning syntax-semantics mapping is introduced, which is an extension of SemAF-DS (ISO, 2013), which in turn is based on Linguistic DS (Description Scheme) in ISO/IEC (2004). Importing more from Linguistic DS, MAP extends some standards devised by ISO/TC37/SC4, including LAF (Linguistic Annotation Framework; ISO 2010), SynAF (Syntactic Annotation Framework; ISO 2012a), and SemAF (Semantic Annotation Framework; ISO 2012b, 2012c, 2013), while incorporating insights from relevant literature (Asher & Lascarides 2003; Carlson, et al., 2003; Haji, et al. 2006; Mann & Thompson 1988; Palmer, et al. 2005; Prasad, et al. 2008; PTB). MAP defines how to diagrammatically annotate linguistic data to specify a mapping between its possible syntactic and semantic structures. The syntactic structures may be dependencies, coordinations, ellipses, and so forth, encompassing both intrasentential transsentential constructions, and so forth. The semantic structures consist of argument structures, scopes (of quantifications, negations, modal operators, etc.), coreferences, anaphora, and so on. A major purpose of MAP is to facilitate collaborations among people involved in research, standardization, and practical use of linguistic annotation. For that sake, MAP provides a diagrammatic sort of concrete syntax (ISO, 2012b, 2012c) for linguistic annotations far easier to understand than traditional textual concrete syntax such as in XML. Besides being diagrammatic and intuitive, MAP is formal in the same sense that UML is formal. Namely, a MAP diagram accurately specifies a mapping between syntactic structures and semantic structures of the annotated linguistic data in question. The rest of the paper is organized as follows. Section 2 introduces MAP diagrams to represent annotated linguistic data. Section 3 and 4 discuss further details of annotations concerning local and nonlocal compositions, respectively. Section 5 concludes the paper. 2. Annotated Segment Let us refer to markable (annotatable) linguistic data as segments. A segment may be text, audio, video, etc., and may be intrasentential or transsentential. In MAP, a segment may accompany a syntactic annotation, a semantic structure, or both. Such a possibly annotated segment is diagrammatically represented by a possibly multi-part box as in Figure 1. Figure 1: Annotated Segment as MAP Diagram The top gray part of the box contains a syntactic annotation to the segment. The middle white part is the body of the whole box and contains the segment itself. As discussed later, this body part may recursively embed smaller annotated segments and, together with the syntacticannotation part, partially specifies the syntactic structure of the segment. The bottom gray part contains a possible semantic structure of the segment. This paper assumes that semantic structures are labelled directed graphs (such as 4080

semantic network and RDF graph) as in Figure 1, but MAP allows any other format for representing semantic structures. Such an annotated segment defines a mapping between possible syntactic structures and possible semantic structures of the segment. The example in Figure 1 involves no syntactic ambiguity, but some examples in the rest of the paper are syntactically ambiguous so that they accommodate multiple possible syntactic structures and therefore multiple possible semantic structures. 3. Local Compositions The semantic structure (as a labelled directed graph) annotating a segment as in Figure 1 has two designated nodes: the head node and the governor node of the segment. The head node has thick border, and the governor node is depicted as a balloon. So the leave&past node in Figure 1 is both the head node and the governor node of segment Tom left. In Figure 2, the @Tom node is the head node and the empty node is the governor node of segment Tom.. Figure 2: Tom Referencing the Agent of an Action This annotated segment represents Tom as a noun phrase referring to Tom as the agent of some action represented by the governor node. In general, the governor node of segment X is equal to the head node segment Y when X (syntactically and hence semantically) depends on (i.e., is governed by) Y, as explained later. Annotated segments may be embedded in the body part of a larger segment composed of them. There is an order among the embedded segments: from left to right and from top to bottom in the case of western languages. For instance, shown in Figure 3 is an annotated segment Tom left whose body part embeds two daughter segments for Tom and left. Figure 3: Local Dependency In general, a thick-bordered daughter segment is the head daughter of the mother segment. So segment left is the head of Tom left in this example. General rules in MAP for dependency constructions follow: [1] The semantic structure of the mother segment is the union of the semantic structures of the daughter segments. [2] The mother segment and the head daughter segments share the same head node and the same governor node. [3] The governor nodes of the dependent daughter segments are the head node of the mother segment (which is same as the head node of the head daughter segment, due to [2]). These rules simplify annotations. For instance, the annotated segment in Figure 3 is equivalent to the one below, because the semantic structure of the whole segment in Figure 3 is derived from those of the daughter segments by the above rules. Figure 5: Simplified Annotation Equivalent to Figure 3 This is a typical annotated segment based on MAP, where only the lexical-entry segments are explicitly annotated with semantic structures and the semantic structures of larger segments are implicitly derived by the above rules. Figure 4: Intersentential Dependency 4081

Figure 6: Semantic-Structure Duplication Due to a Distributive Coordination The same rules apply to dependencies outside of sentences (i.e., dependencies among sentences, paragraphs, sections, and so forth), too, as follows. Distributive coordinations are accounted for just by the abovet rule [1]. Figure 7 shows how this works, where again the semantic structure of the mother segment may be omitted thanks to the rule. Figure 9: Coreference Precisely speaking, an eq link represents the coreference between the head nodes of the two linked segments. So an eq link is used also for a relativization to address the coreference between the head noun and the gap in the relative clause, as follows. Figure 7: Distributive Coordination This whole noun phrase and a verb phrase compose a sentence while duplicating the head node of the verb phrase as follows. On the other hand, a collective coordination has a single head node and a single governor node, though further details are omitted in this abstract. 4. Nonlocal Compositions MAP uses typed links to express relationships among unadjacent segments. For instance, a dep link addresses an unadjacent dependency, such as in the extraposition below. Figure 8: Extraposition Hereafter the syntactic annotation parts and the semantic structure parts of the segments are omitted for the sake of simplicity. An eq link addresses a coreference, as below. Figure 10: Relativization A partof link means that the head node of the source segment refers to a part of the referent of the head node of the destination segment. Below is an example of an indirect anaphora, where the parof link means that the door is a part of the house. Figure 11: Indirect Anaphora A coscope link means that the head nodes of the two linked segments belong to the same scope (of quantification, negation, modal operator, or other type of abstraction). For instance, the following example means that there is a specific woman whom every man loves, because the woman belongs to the same scope to which the state of affairs referenced by the entire sentence belongs. Figure 12: Wide-Scope Reading of a woman 4082

On the other hand, the below means that different men may love different women. Figure 13: Narrow-Scope Reading of a woman Similarly, in Figure 14 there is a specific doctor who Jane wants to marry, as the coscope link points to the topmost scope encompassing the entire discourse, whereas in Figure 15 there is no such specific doctor, as the coscope link there means that the marrying event and the doctor belong to the same scope of the modal operator corresponding to wants. Figure 16: Ellipsis Similarly, the below example illustrates a comparative construction involving an ellipsis, where Sue is interpreted as Tom loves Sue by copying Tom loves Mary while substituting Mary with Sue. Figure 14: Wide-Scope Reading of a doctor Figure 18: Ellipsis in Comparative For instance, Tom loves his wife. So does Bill. is ambiguous as to whether Bill loves Tom s wife (so called strict identity) or Bill s wife (sloppy identity). Figure 15: Narrow-Scope Reading of a doctor The cp and subst links address ellipses, which is a reformulation of part of the Penn TreeBank (PTB) annotation scheme. For instance, the below example means that Bill wants to date with Sue, because the latter half of the sentence is interpreted by copying the former half while substituting Tom with Bill and Mary with Sue. Figure 19: Ambiguity Concerning Strict/Sloppy Identity This ambiguity is resolved by coscope links. If his has a wider scope than Tom loves his wife. then the copy operation excludes his and hence the eq link as well, to infer that Bill loves Tom s wife. Figure 17: Strict Identity Figure 20: Sloppy Identity 4083

If his and Tom loves his wife. have the same scope, on the other hand, then the copy operation involves the eq link and its destination ( Tom ) is substituted by Bill, which means that Bill loves Bill s wife. Language Resources and Evaluation. PTB. The Penn Treebank Project. http://www.cis.upenn.edu/~treebank/ 5. Final Remarks MAP provides a diagrammatic annotation scheme to specify mappings between syntactic and semantic structures of annotated segments. In typical annotations, only the lexical-entry segments are explicitly annotated with semantic structures, and rules [1] through [3] and links among segments derive the semantic structures of larger segments. MAP, NAF (Fokkens, et al., 2014), and NKF (NLP Annotation Knowledge-Base Format) are closely related potential work items in ISO/TC37/SC4/WG5. Since they have similar objectives and hence many common features, their relationship must be sorted out to define how to coordinate them. References N. Asher & A. Lascarides (2003) Logics of Conversation. Cambridge University Press. L. Carlson, D. Marcu, M. E. Okurowski (2003) Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory. J. van Kuppevelt & R. Smith (eds.) Current Directions in Discourse and Dialogue, 85-112, Kluwer Academic Publishers. A. Fokkens, A. Soroa, Z. Beloki, N. Ockeloen, G. Rigau, W. R. van Hage, and P. Vossen (2014) NAF and GAF: Linking Linguistic Annotations. Proceedings of 10th Joint ISO-ACL SIGSEM Workshop on Interoperable Semantic Annotation. Reykjavik, Iceland. J. Haji, et al. (2006) Prague Dependency Treebank 2.0. Linguistic Data Consortium, Philadelphia. ISO (2010) ISO 24615:2010, Language resource management. Syntactic annotation framework (SynAF). ISO (2012a) ISO 24612:2012, Language resource management. Linguistic annotation framework (LAF). ISO (2012b) ISO 24617.1:2012, Language resource management. Semantic annotation framework. Part 1: Time and events (SemAF-Time, ISO-TimeML). ISO (2012c) ISO 24617.2:2012, Language resource management. Semantic annotation framework. Part 2: Dialogue Acts. ISO (2013) ISO TS 24617-5: Language Resource Management, Semantic Annotation Framework (SemAF), Part 5: Discourse structure (SemAF-DS). ISO/IEC (2004) ISO/IEC 15938.5:2003/Amd.1:2004, Information technology. Multimedia content description interface. Part 5: Multimedia description schemes AMENDMENT 1: Multimedia description schemes extensions (MPEG-7 MDS AMD1). W. Mann & S. Thompson (1988) Rhetorical Structure Theory: A Theory of Text Organisation. Text, 8(3) 243.281. M. Palmer, D. Gildea, P. Kingsbury (2005) The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, 31(1), 71-105. R. Prasad. N. Dinesh, A. Lee, E. Miltsakaki, L. Robaldo, A. Joshi, et al. (2008) The Penn Discourse Treebank 2.0. Proceedings of the 6th International Conference on 4084