Syntactic Reordering of Source Sentences for Statistical Machine Translation

Similar documents
Language Model and Grammar Extraction Variation in Machine Translation

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

The stages of event extraction

Noisy SMS Machine Translation in Low-Density Languages

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Annotation Projection for Discourse Connectives

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

CS 598 Natural Language Processing

Learning Computational Grammars

Re-evaluating the Role of Bleu in Machine Translation Research

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Accurate Unlexicalized Parsing for Modern Hebrew

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Developing a TT-MCTAG for German with an RCG-based Parser

Prediction of Maximal Projection for Semantic Role Labeling

Training and evaluation of POS taggers on the French MULTITAG corpus

Grammars & Parsing, Part 1:

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Ensemble Technique Utilization for Indonesian Dependency Parser

Linking Task: Identifying authors and book titles in verbose queries

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Using dialogue context to improve parsing performance in dialogue systems

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

arxiv: v1 [cs.cl] 2 Apr 2017

Context Free Grammars. Many slides from Michael Collins

Discriminative Learning of Beam-Search Heuristics for Planning

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Parsing of part-of-speech tagged Assamese Texts

The KIT-LIMSI Translation System for WMT 2014

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

An Interactive Intelligent Language Tutor Over The Internet

Cross Language Information Retrieval

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Beyond the Pipeline: Discrete Optimization in NLP

Natural Language Processing. George Konidaris

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Treebank mining with GrETEL. Liesbeth Augustinus Frank Van Eynde

Distant Supervised Relation Extraction with Wikipedia and Freebase

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Interactive Corpus Annotation of Anaphor Using NLP Algorithms

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Online Updating of Word Representations for Part-of-Speech Tagging

An Efficient Implementation of a New POP Model

The Smart/Empire TIPSTER IR System

Probabilistic Latent Semantic Analysis

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

The Ups and Downs of Preposition Error Detection in ESL Writing

Character Stream Parsing of Mixed-lingual Text

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Some Principles of Automated Natural Language Information Extraction

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

CS Machine Learning

Constraining X-Bar: Theta Theory

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

The Role of the Head in the Interpretation of English Deverbal Compounds

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Second Exam: Natural Language Parsing with Neural Networks

LTAG-spinal and the Treebank

Assignment 1: Predicting Amazon Review Ratings

Evolution of Symbolisation in Chimpanzees and Neural Nets

Proof Theory for Syntacticians

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

AQUA: An Ontology-Driven Question Answering System

Overview of the 3rd Workshop on Asian Translation

Multilingual Sentiment and Subjectivity Analysis

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

LNGT0101 Introduction to Linguistics

Construction Grammar. University of Jena.

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Compositional Semantics

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

arxiv: v1 [cs.cv] 10 May 2017

Create A City: An Urban Planning Exercise Students learn the process of planning a community, while reinforcing their writing and speaking skills.

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

Regression for Sentence-Level MT Evaluation with Pseudo References

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Visual CP Representation of Knowledge

A First-Pass Approach for Evaluating Machine Translation Systems

A Framework for Customizable Generation of Hypertext Presentations

Multi-Lingual Text Leveling

Detecting English-French Cognates Using Orthographic Edit Distance

Methods for the Qualitative Evaluation of Lexical Association Measures

An Introduction to the Minimalist Program

Transcription:

Syntactic Reordering of Source Sentences for Statistical Machine Translation Mohammad Sadegh Rasooli Columbia University rasooli@cs.columbia.edu April 9, 2013 M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 1 / 27

Overview 1 First Paper: Collins, et al. (2005) The Role of Syntax in SMT Syntactic Preprocessing Approaches Clause Restructing Experiments Discussion 2 Second Paper: P. Xu, et al., (2009). Approaches to Syntactic Reordering Translation Between SVO and SOV Languages Precedence Reordering Based on a Dependency Parser Experiments Discussion M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 2 / 27

First Paper M. Collins, et al.: Clause Restructuring for Statistical Machine Translation. ACL 2005. M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 3 / 27

The Role of Syntax in SMT In the original phrase-base SMT, syntax is not taken into acount. Phrase-based systems have limited potential to model word-order differences between languages. The word order differences between languages are considered as distortion. Each reordering rule adds distortion penalties to the overall score of the translation model. M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 4 / 27

Example: German vs. English Word Order English I will pass on to you the corresponding comments, so that you can adopt them perhaps in the vote. German I will to you the corresponding comments pass on, so that you them perhaps in the vote adopt can. M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 5 / 27

Research on Syntax in MT Changing the word order of one of the languages or both, to make their word order more similar to each other. Syntax-Based MT Approaches Make use of bitext grammars to parse both parts. Change the syntax of target language alone. Transform the translation problem into a parsing problem. Reranking methods Select between N-best results of the phrase-based system, using syntactic information. Preprocessing Approaches The source language sentences are modified before translation. This approach is used in this paper. M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 6 / 27

Syntactic Preprocessing Approaches English I will pass on to you the corresponding comments, so that you can adopt them perhaps in the vote. German I will to you the corresponding comments pass on, so that you them perhaps in the vote adopt can. German (Preprocessed) I will pass on to you the corresponding comments, so that you can adopt them perhaps in the vote. M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 7 / 27

Clause Restructing Steps (both in training and decoding) 1 Parse the source sentence. 2 Apply reordering rules on the source sentence. 3 Use phrase-based models. M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 8 / 27

Example Parse Tree M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 9 / 27

Six Reordering Rules in German M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 10 / 27

Experiments Experimental setup Data: Europarl Corpus. 751,088 parallel sentence. Evaluation on 2000 sentences. Average sentence length: 28 words Baseline: no reordering phrase-based system. Results (BLEU score) Basline: 25.2% Reordering: 26.8% M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 11 / 27

Human Translation Judgments Two annotators judged 100 sentences (10 to 20 words in length; chosen at random). Three versions: Human, baseline, reordered. Judgments: Worse/better or equal. Better Equal Worse Annotator 1 40% 40% 20% Annotator 2 44% 37% 19% M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 12 / 27

Example Output Human i think it is wrong in principle to have such measures in the european union. Reordered i believe that it is wrong in principle to take such measures in the european union. Baseline i believe that it is wrong in principle such measures in the european union to take. M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 13 / 27

BLEU Statistical Significance Authors use sign test for statistical significance. f(x) is + if better than baseline, f(x) is - if worse; and f(x) is = if equal p + : probability of (f(x) is +) and p : probability of f(x) is minus BLEU does not have per-sentence evaluation. Authors create an artificial comparison: s baseline BLEU score s i baseline BLEU score except the sentence i translated by the reordered model. f(x) is + is s i > s; f(x) is - is s i < s. 52.85% improved, 36.4% worse than baseline and 10.75% equal. With 95% confidence, this method improves the baseline. M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 14 / 27

Discussion The method clearly improves the baseline. The rules are language-specific (even cannot be used for English to German translation). The authors did not try to learn reordering rules automatically. M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 15 / 27

Second Paper P. Xu, et al., Using a dependency parser to improve SMT for subject-object-verb languages. NAACL 2009. M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 16 / 27

Approaches to Syntactic Reordering Explicitly model phrase reordering distances; e.g. distance based distortion models. Syntactic analysis of the target language into both modeling and decoding. Reordering source sentences based on syntactic analysis This paper uses this approach M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 17 / 27

Translation Between SVO and SOV Languages Subject-Verb-Object (SVO) and Subject-Object-Verb (SOV) are two common word order in the world languages. English is SVO and Korean is SOV. John hit the ball. vs. John the ball hit. When the sentences get longer, the cost of moving structures during decoding (in phrase-based models) can be quite high. English is used as the first or second language in many countries around the world. is used should skip 13 words to go to the end of the sentence. M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 18 / 27

Precedence Reordering Based on a Dependency Parser The children of each word have some relative ordering. A Precedence reordering rule is a mapping from T to a set of tuples {(L, W, O)} T : POS tag L: Dependency label W : Weight indicating the order (highest to lowest) Children with the same weights are ordered according to the order defined in the rule. Why not explicitly pre-define unequal weights? O: order type NORMAL: preserve the original order RESERVE: flip the order If a node is not listed in the rules, W = 0 and O = NORMAL Use self to refer to the head node itself. Punctuations and conjugations disallow movements across them. M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 19 / 27

Precedence Reordering Based on a Dependency Parser After apply precedence rule, this will be: John the ball hit can. M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 20 / 27

Novelties in This Work 1 This model is more efficient than its counterpart. 2 Outperforms the state-of-the-art (stronger baseline). 3 It is not restricted to one language pair. 4 It is possible to automatically learn precedence rules. 5 They use dependency parse trees rather than constituency trees. M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 21 / 27

Experiments English to 5 SOV languages. Baseline: Maximum entropy based lexicalized phrase reordering model. Maximum allowed reordering: 10. Parser: Deterministic transition-based dependency parser. Parses in linear time. Another baseline: Hierarchical phrase-based system. Can capture long distance reordering by using a PCFG model. Uses chart parsing during decoding: slower than deterministic dependency parser. 9.5K English sentences (from web) as evaluation data. 3,500 sentences for dev (to perform MERT). 1,000 sentences for test. 5,000 sentences for blind test. M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 22 / 27

Experiments M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 23 / 27

Results M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 24 / 27

Results M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 25 / 27

Discussion Reordering of languages with different word orders is essential. The method seems to work fine for 5 languages. Although authors claim that the rule can be extracted automatically, they did not try. The improvement of the basic over hierarchical phrase-based is not significant. M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 26 / 27

Thanks! M. S. Rasooli (Columbia University) Syntactic Reordering for SMT April 9, 2013 27 / 27