Evaluating Translational Correspondence using Annotation Projection

Similar documents
Cross Language Information Retrieval

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Grammars & Parsing, Part 1:

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Prediction of Maximal Projection for Semantic Role Labeling

Compositional Semantics

A Computational Evaluation of Case-Assignment Algorithms

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Accurate Unlexicalized Parsing for Modern Hebrew

Natural Language Processing. George Konidaris

Language Model and Grammar Extraction Variation in Machine Translation

The Strong Minimalist Thesis and Bounded Optimality

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

CS 598 Natural Language Processing

Context Free Grammars. Many slides from Michael Collins

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Some Principles of Automated Natural Language Information Extraction

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Proof Theory for Syntacticians

LTAG-spinal and the Treebank

Linking Task: Identifying authors and book titles in verbose queries

Parsing of part-of-speech tagged Assamese Texts

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Constructing Parallel Corpus from Movie Subtitles

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Frequency and pragmatically unmarked word order *

The Role of the Head in the Interpretation of English Deverbal Compounds

Probabilistic Latent Semantic Analysis

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Developing a TT-MCTAG for German with an RCG-based Parser

The stages of event extraction

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

LNGT0101 Introduction to Linguistics

The Pennsylvania State University. The Graduate School. College of the Liberal Arts THE TEACHABILITY HYPOTHESIS AND CONCEPT-BASED INSTRUCTION

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Noisy SMS Machine Translation in Low-Density Languages

Learning Computational Grammars

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

arxiv: v1 [cs.cl] 2 Apr 2017

The Smart/Empire TIPSTER IR System

Using dialogue context to improve parsing performance in dialogue systems

CS Machine Learning

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

The taming of the data:

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Annotation Projection for Discourse Connectives

Modeling full form lexica for Arabic

Adapting Stochastic Output for Rule-Based Semantics

A Domain Ontology Development Environment Using a MRD and Text Corpus

A Version Space Approach to Learning Context-free Grammars

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

Loughton School s curriculum evening. 28 th February 2017

The Federal Reserve Bank of New York

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

The Discourse Anaphoric Properties of Connectives

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

An Introduction to the Minimalist Program

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Re-evaluating the Role of Bleu in Machine Translation Research

Named Entity Recognition: A Survey for the Indian Languages

Focus of the Unit: Much of this unit focuses on extending previous skills of multiplication and division to multi-digit whole numbers.

1.11 I Know What Do You Know?

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Shockwheat. Statistics 1, Activity 1

Applications of memory-based natural language processing

Underlying and Surface Grammatical Relations in Greek consider

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

Ensemble Technique Utilization for Indonesian Dependency Parser

Parsing Morphologically Rich Languages:

Mandarin Lexical Tone Recognition: The Gating Paradigm

Multiple case assignment and the English pseudo-passive *

Secret Code for Mazes

Constraining X-Bar: Theta Theory

Construction Grammar. University of Jena.

L1 and L2 acquisition. Holger Diessel

Grammar Extraction from Treebanks for Hindi and Telugu

(Sub)Gradient Descent

Refining the Design of a Contracting Finite-State Dependency Parser

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

CAS LX 522 Syntax I. Long-distance wh-movement. Long distance wh-movement. Islands. Islands. Locality. NP Sea. NP Sea

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Transcription:

Evaluating Translational Correspondence using Annotation Projection R. Hwa, P. Resnik, A. Weinberg & O. Kolak (2002) Presented by Jeremy G. Kahn Presentation for Ling 580 (Machine Translation) 10 Jan 2006 1

Introduction The Main Issue: syntactic divergence Trees in 2 languages may be homomorphic... or not same basic shape, different rotations of CFG different basic shape of CFG (rules can t correspond) Direct Correspondence Assumption (DCA): the syntactic relationships in one language directly map to the syntactic relationships in the other. In this paper, syntactic relationships dependencies. 2

Exploring the DCA DCA is implicit in: 1. Inversion Transduction Grammar (ITG) (& D. Wu s SITG) Suppose: L1 is SVO, D-N language; L2 is SOV, N-D language S NP VP NP [D N] VP [V NP] Note special meaning for bracketing. not mentioned in the paper: Melamed s synchronous CFGs, a superset of ITG 2. synchronized dependency trees (Alshawi et al. 2000) 3

4

DCA as formalism Given a pair of sentences E and F that are (literal) translations of each other with syntactic structures Tree E and Tree F if nodes x E and y E of Tree E are aligned with nodes x F and y F of Tree F respectively, and if syntactic relationship R(x E, y E ) holds in Tree E then R(x F, y F ) holds in Tree F 5

Why is the DCA good? matches a linguistic thought: thematics (dependencies) are held constant but word order may change fairly elegant conceptually allows us to take advantage of formalisms like ITG, synchronized trees 6

Potential problems with the DCA 1. word-to-word correspondence questions - morphology in one language may be word (or word-order) in another the Basque dative vs. English for Basque buy, past two words vs. English bought portmanteau 2. tree structures in use may not be the right rotational operations (not mentioned in the text): SVO vs OSV languages (Arabic), using 2-branching ex: [I [like apples]] vs [apples [I like]] VP relation becomes disconnected 7

Looking at the DCA: a task Comparing English (En) and Chinese (Zh) structures through projection Given: Gold English parses (dependencies) & gold word-alignment Task: project En (dependency) structures onto Zh word sequence Evaluate: projected En Zh dependencies vs. independently derived Zh dependencies (unlabeled dependency P,R,F ) 8

Corpus Dev set: 124 Zh sentences (av length 23.7), En translations by hand. Zh dep trees derived by hand (guided by TB). (2 annotators, 92.4% annotation agreement) Test set: 88 Zh sentences (av length 19.0), En translations from NIST MT project Zh dep trees derived automatically from TB (a la Xia & Palmer 2001) Both sets: Zh trees originate with Zh treebank (but deps derived differently) En deps generated via parse (Collins 97) and hand-correct 9

Algorithm 1: Direct Projection Algorithm (DPA) 4 cases: paired 1-to-1 alignments: two 1-1 alignments that share an E-side dependency induce an F-side dependency. 10

Algorithm 1: Direct Projection Algorithm (DPA) #2 Unaligned E-side: En words w e with no Zh word: create an F-side word n f. For each E-side dependency involving w e, if the non-w e token (x e ) aligns 1-to-1 with an F-side word (x f ) induce an F-side dependency between n f and x f. 11

Algorithm 1: Direct Projection Algorithm (DPA) #3 1 En to many Zh: A single E-side word w E aligned with several w f words: invent an F-side word n f and make all the w f children of that word. Align w e to n f. (presumably, return to case 1) 12

Algorithm 1: Direct Projection Algorithm (DPA) #4 many En to 1 Zh: A single F-side word w f is aligned with several w e words. (Select a head w eh from w e words), align w f with w eh only. Also, any dependencies that involve the modifier (non-head) E-side words (m e ) should be pointed at w f on the F-side. Many-to-many (vaguely) is 1-to-many then many-to-1 (?) 13

Error analysis of DPA Dev set results ( exp 1 ) show that DPA on dev set is lousy: P 30.1, R 39.1 Error analysis: lots of multiply-aligned, unaligned tokens. In particular, difference in morph boundaries and word content. Chinese measure words (ex as diagram) yi ge ping-guo 1 meas apple Chinese aspect words qu le go COMPLETE an apple went or to have gone these emerge as 1En-to-manyZh and unaligned-zh cases. 14

Revised DPA: Revision 1: head-initial revised 1-to-many rule rather than creating n f, just assume that the left-most F word is the head and draw dependencies from there. 15

Revised DPA: Revision 2: Zh-side cleanup Restricted themselves to: closed class items POS info projected from En easily listed lexical categories an example: if a series of Zh words are aligned with an En noun, make the rightmost word the head. (Chinese is right-headed in nominal system, left-headed elsewhere) 16

Revised DPA: Revision 2: Zh-side cleanup other examples include: enchain de linking subordinator currency handling (wanna look at the rules? They re in a tech report, so you ll have to write Dr. Hwa) 17

Results Method Precision Recall F -measure DPA 34.5 42.5 38.1 RDPA 1 (head-initial) 59.4 59.4 59.4 RDPA 1+2 (h-i & rules) 68.0 66.6 67.3 total 76.6% F-measure gain over baseline(!) 18

Discussion application of minimal linguistic knowledge to transfer information from one language to another on the MT pyramid low-middle approach, but much syntax gained! potential applications for MT? learn syntactic relations from translations of well-parsed E learn phrase boundaries? Stats MT (mostly) doesn t use DCA how can these be combined? 19