Automatic Category Label Coarsening for Syntax-Based Machine Translation

Similar documents
The stages of event extraction

Grammars & Parsing, Part 1:

Context Free Grammars. Many slides from Michael Collins

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Language Model and Grammar Extraction Variation in Machine Translation

Prediction of Maximal Projection for Semantic Role Labeling

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Noisy SMS Machine Translation in Low-Density Languages

LTAG-spinal and the Treebank

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

arxiv: v1 [cs.cl] 2 Apr 2017

SEMAFOR: Frame Argument Resolution with Log-Linear Models

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Cross Language Information Retrieval

The Indiana Cooperative Remote Search Task (CReST) Corpus

cmp-lg/ Jan 1998

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

CS 598 Natural Language Processing

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Parsing of part-of-speech tagged Assamese Texts

Accurate Unlexicalized Parsing for Modern Hebrew

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

TINE: A Metric to Assess MT Adequacy

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Using dialogue context to improve parsing performance in dialogue systems

Extracting Verb Expressions Implying Negative Opinions

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Some Principles of Automated Natural Language Information Extraction

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

The KIT-LIMSI Translation System for WMT 2014

Control and Boundedness

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Learning Computational Grammars

The Interface between Phrasal and Functional Constraints

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

A heuristic framework for pivot-based bilingual dictionary induction

Ensemble Technique Utilization for Indonesian Dependency Parser

Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN:

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Compositional Semantics

The NICT Translation System for IWSLT 2012

Developing a TT-MCTAG for German with an RCG-based Parser

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Linking Task: Identifying authors and book titles in verbose queries

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Detecting English-French Cognates Using Orthographic Edit Distance

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Learning Methods in Multilingual Speech Recognition

The Smart/Empire TIPSTER IR System

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Create Quiz Questions

BYLINE [Heng Ji, Computer Science Department, New York University,

An Interactive Intelligent Language Tutor Over The Internet

Speech Recognition at ICSI: Broadcast News and beyond

Training and evaluation of POS taggers on the French MULTITAG corpus

Introduction. Beáta B. Megyesi. Uppsala University Department of Linguistics and Philology Introduction 1(48)

Re-evaluating the Role of Bleu in Machine Translation Research

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

Proceedings of the 19th COLING, , 2002.

The Ups and Downs of Preposition Error Detection in ESL Writing

Beyond the Pipeline: Discrete Optimization in NLP

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Word Sense Disambiguation

Using Semantic Relations to Refine Coreference Decisions

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Regression for Sentence-Level MT Evaluation with Pseudo References

Multilingual Sentiment and Subjectivity Analysis

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

An Introduction to the Minimalist Program

Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

An Efficient Implementation of a New POP Model

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Construction Grammar. University of Jena.

Transcription:

Automatic Category Label Coarsening for Syntax-Based Machine Translation Greg Hanneman and Alon Lavie Language Technologies Institute Carnegie Mellon University Fifth Workshop on Syntax and Structure in Statistical Translation June 23, 2011

SCFG-based MT: Motivation Training data annotated with constituency parse trees on both sides Extract labeled SCFG rules A::JJ [bleues]::[blue] NP::NP [D 1 N 2 A 3 ]::[DT 1 JJ 3 NNS 2 ] We think syntax on both sides is best But joint default label set is too large 2

Labeling ambiguity: Motivation Same RHS with many LHS labels JJ::JJ [ 快速 ]::[fast] AD::JJ [ 快速 ]::[fast] JJ::RB [ 快速 ]::[fast] VA::JJ [ 快速 ]::[fast] VP::ADJP [VV 1 VV 2 ]::[RB 1 VBN 2 ] VP::VP [VV 1 VV 2 ]::[RB 1 VBN 2 ] 3

Rule sparsity: Motivation Label mismatch blocks rule application VP::VP [VV 1 VP::VP [VV 1 了 PP 2 的 NN 3 ]::[VBD 1 their NN 3 PP 2 ] 了 PP 2 的 NN 3 ]::[VB 1 their NNS 3 PP 2 ] saw their friend from the conference see their friends from the conference saw their friends from the conference 4

Motivation Solution: modify the label set Preference grammars [Venugopal et al. 2009] X rule specifies distribution over SAMT labels Avoids score fragmentation, but original labels still used for decoding Soft matching constraint [Chiang 2010] Substitute A::Z at B::Y with model cost subst(b, A) and subst(y, Z) Avoids application sparsity, but must tune each subst(s 1, s 2 ) and subst(t 1, t 2 ) separately 5

Our Approach Difference in translation behavior different category labels la grande voiture la plus grande voiture la voiture la plus grande the large car the larger car the largest car Simple measure: how category is aligned to other language A::JJ [grande]::[large] AP::JJR [plus grande]::[larger] 6

L 1 Alignment Distance JJ JJR JJS 7

L 1 Alignment Distance JJ JJR JJS 8

L 1 Alignment Distance JJ JJR JJS 9

L 1 Alignment Distance JJ JJR JJS 10

L 1 Alignment Distance JJ 0.9941 JJR JJS 0.8730 0.3996 11

Label Collapsing Algorithm Extract baseline grammar from aligned tree pairs (e.g. Lavie et al. [2008]) Compute label alignment distributions Repeat until stopping point: Compute L 1 distance between all pairs of source and target labels Merge the label pair with smallest distance Update label alignment distributions 12

Experiment 1 Goal: Explore effect of collapsing with respect to stopping point Data: Chinese English FBIS corpus (302 k) Parallel Corpus Parse Word Align Extract Grammar Collapse Labels Build MT System 13

Experiment 1 14

Experiment 1 15

Effect on Label Set Number of unique labels in grammar Zh En Joint Baseline 55 71 1556 Iter. 29 46 51 1035 Iter. 45 38 44 755 Iter. 60 33 34 558 Iter. 81 24 22 283 Iter. 99 14 14 106 16

Effect on Grammar Split grammar into three partitions: Phrase pair rules NN::NN [ 友好 ]::[friendship] Partially lexicalized grammar rules NP::NP [2000 年 NN 1 ]::[the 2000 NN 1 ] Fully abstract grammar rules VP::ADJP [VV 1 VV 2 ]::[RB 1 VBN 2 ] 17

Effect on Grammar 18

Effect on Metric Scores NIST MT 03 Chinese English test set Results averaged over four tune/test runs BLEU METR TER Baseline 24.43 54.77 68.02 Iter. 29 27.31 55.27 63.24 Iter. 45 27.10 55.24 63.41 Iter. 60 27.52 55.32 62.67 Iter. 81 26.31 54.63 63.53 Iter. 99 25.89 54.76 64.82 19

Effect on Decoding Different outputs produced Collapsed 1-best in baseline 100-best: 3.5% Baseline 1-best in collapsed 100-best: 5.0% Different hypergraph entries explored in cube pruning 90% of collapsed entries not in baseline Overlapping entries tend to be short Hypothesis: different rule possibilities lead search in complementary direction 20

Experiment 2 Goal: Explore effect of collapsing across language pairs Data: Chinese English FBIS corpus, French English WMT 2010 data Zh En Corpus Parse Word Align Extract Grammar Collapse Labels Build MT System 21

Experiment 2 Goal: Explore effect of collapsing across language pairs Data: Chinese English FBIS corpus, French English WMT 2010 data Zh En Fr En Corpus Corpus Parse Parse Word Word Align Align Extract Extract Grammar Grammar Collapse Collapse Labels Labels Build MT Build System MT System 22

Effect on English Collapsing Adverbs Zh En: RB, RBR Fr En: RBR, RBS Verbs Zh En: VB, VBG, VBN Fr En: VB, VBD, VBN, VBP, VBZ, MD Wh-phrases Zh En: ADJP, WHADJP; ADVP, WHADVP Fr En: PP, WHPP 23

Effect on Label Set Full subtype collapsing VNV VSB VRD VPT VCD VCP VC Partial subtype collapsing NN NNS NNPS NNP N Combination by syntactic function RRC WHADJP INTJ INS 24

Conclusions Can effectively coarsen labels based on alignment distributions Significantly improved metric scores at all attempted stopping points Reduces rule sparsity more than labeling ambiguity Points decoder in different direction Different results for different language pairs or grammars 25

Future Work Take rule context into account [NP::NP] [D 1 N 2 ]::[DT 1 NN 2 ] [NP::NP] [les N 2 ]::[NNS 2 ] la voiture / the car les voitures / cars Try finer-grained label sets [Petrov et al. 2006] NP NP-0, NP-1,..., NP-30 VBN RBS VBN-0, VBN-1,..., VBN-25 RBS-0 Non-greedy collapsing 26

References Chiang (2010), Learning to translate with source and target syntax, ACL Lavie, Parlikar, and Ambati (2008), Syntax-driven learning of sub-sentential translation equivalents and translation rules from parsed parallel corpora, SSST-2 Petrov, Barrett, Thibaux, and Klein (2006), Learning accurate, compact, and interpretable tree annotation, ACL/COLING Venugopal, Zollmann, Smith, and Vogel (2009), Preference grammars: Softening syntactic constraints to improve statistical machine translation, NAACL 27