Reordering Models for Statistical Machine Translation: A Literature Survey

Similar documents
Language Model and Grammar Extraction Variation in Machine Translation

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Prediction of Maximal Projection for Semantic Role Labeling

Grammars & Parsing, Part 1:

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Training and evaluation of POS taggers on the French MULTITAG corpus

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

CS 598 Natural Language Processing

Parsing of part-of-speech tagged Assamese Texts

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Noisy SMS Machine Translation in Low-Density Languages

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

The NICT Translation System for IWSLT 2012

The KIT-LIMSI Translation System for WMT 2014

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Re-evaluating the Role of Bleu in Machine Translation Research

Accurate Unlexicalized Parsing for Modern Hebrew

Using dialogue context to improve parsing performance in dialogue systems

The stages of event extraction

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

arxiv: v1 [cs.cl] 2 Apr 2017

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

THE VERB ARGUMENT BROWSER

The RWTH Aachen University English-German and German-English Machine Translation System for WMT 2017

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Cross Language Information Retrieval

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Ensemble Technique Utilization for Indonesian Dependency Parser

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Beyond the Pipeline: Discrete Optimization in NLP

Context Free Grammars. Many slides from Michael Collins

The Smart/Empire TIPSTER IR System

Some Principles of Automated Natural Language Information Extraction

Analysis of Probabilistic Parsing in NLP

Experts Retrieval with Multiword-Enhanced Author Topic Model

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Learning Computational Grammars

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

AQUA: An Ontology-Driven Question Answering System

Universiteit Leiden ICT in Business

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Applications of memory-based natural language processing

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Extracting Verb Expressions Implying Negative Opinions

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

A Framework for Customizable Generation of Hypertext Presentations

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Linking Task: Identifying authors and book titles in verbose queries

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Natural Language Processing. George Konidaris

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Compositional Semantics

LTAG-spinal and the Treebank

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Memory-based grammatical error correction

A Computational Evaluation of Case-Assignment Algorithms

Named Entity Recognition: A Survey for the Indian Languages

Specifying a shallow grammatical for parsing purposes

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Probabilistic Latent Semantic Analysis

The Interface between Phrasal and Functional Constraints

BYLINE [Heng Ji, Computer Science Department, New York University,

Two methods to incorporate local morphosyntactic features in Hindi dependency

Developing a TT-MCTAG for German with an RCG-based Parser

BULATS A2 WORDLIST 2

SOME MINIMAL NOTES ON MINIMALISM *

TINE: A Metric to Assess MT Adequacy

Annotation Projection for Discourse Connectives

Search right and thou shalt find... Using Web Queries for Learner Error Detection

Character Stream Parsing of Mixed-lingual Text

Vocabulary Usage and Intelligibility in Learner Language

A Graph Based Authorship Identification Approach

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Leveraging Sentiment to Compute Word Similarity

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Indian Institute of Technology, Kanpur

Transcription:

Reordering Models for Statistical Machine Translation: A Literature Survey Piyush Dilip Dungarwal 123050083 June 19, 2014 In this survey, we briefly study various reordering models that are used with statistical translation models. Reordering model is one of the important component of any statistical machine translation system. Problem of reordering is NP-Hard itself. In this survey, we study various reordering approaches that can be used to solve this problem. We first study simple distortion-based reordering which is used with phrasebased and factor-based models. Next, we discuss limitations of this distance-based approach. Then we introduce a new source-reordering based approach to handle the reorderings based on structural information of the input text. We study how to use parse trees and shallow parsing for source-side reordering. 1 Distortion penalty-based reordering We know noisy channel approach to translation: e best = argmax e p(f e)p LM (e) For phrase-based models, we decompose p(f e) into : I p( f 1 ē I I 1) = φ( f i ē i )d(start i end i 1 1) i=1 Foreign sentence f is broken into I phrases. Each foreign phrase f i is translated into an English phrase ē i. Phrase-based model handles reordering by a distance-based reordering model. This distance is computed relative to the previous phrase. Consider this notation: 1

start i is the starting position of the foreign phrase that translates to ith English phrase. end i is the ending position of the foreign phrase that translates to ith English phrase. Now, reordering distance is computes as: start i end i 1 1. Reordering distance is nothin but the number of words skipped, while taking foreign words out of sequence. Figure 1 shows an example of distance-based reordering. Figure 1: Example of distance-based reordering [Koehn, 2010] Reordering probabilities d are estimated as: d(x) = α ( x ). α is a parameter such that αǫ[0, 1], which helps reduce d() to a probability distribution. 1.1 Lexicalized reordering Instead of reordering based on the phrase positions, lexicalized reordering is conditioned on the actual phrases. There are three types of lexicalized reorderings: monotone (m), swap with previous phrase (s), and discontinuous (d). Figure 2 shows three types of phrase orientations. We need to create a probability model which can predict the orientation type of the given phrase. We use word alignments for the same. Orientation type can be detected as follows: Monotone ordering (m): if a word alignment point to the top left exists 2

Swap with previous phrase (s): if a word alignment point to the top right exists Discontinuous (d): if no word alignment point exists to the top left or to the top right Now, reordering model p o is learnt from counting number of phrases in training corpus having particular orientation. We use maximum likelihood principle: p o (orientation f,ē) = count(orientation, f,ē)/ o count(o, f,ē) Figure 2: Example of lexicalized reordering [Koehn, 2010] 2 Source-side reordering As we know, English follows SVO (Subject-Verb-Object) order while Hindi follows SOV (Subject-Object-Verb) order. Phrase-based model and Factored models try to handle this reordering phenomenon using a distortion penalty. A word is allowed to be placed anywhere inside a fixed-size window. But, if word goes outside this window, it leads to the inclusion of penalty while scoring the translation. This approach of handling reordering phenomenon performs well when ordering of words does not vary too much after translation. Hence, this approach is not suggested for translation between English and Indian languages in general. 3

Another useful appraoch for handling reordering is to convert source side sentence into a sentence in which words are ordered according to the position of the words in target side. This is called source-side reordering. This step needs to be done before actual training. We need to source-reorder training text as well as test sentences. 2.1 Parse tree-based reordering Source-side reordering can be achieved by using syntactic parse tree of the source sentece. We either need to learn the reordering rules or need to find them out manually. A rich set of rules is created, which work on reordering children of nodes of a syntactic parse tree [Patel et al., 2013]. Source reordering helps to learn better word alignements and better phrase extraction. Approach: Figure 3: Examples of source-side reordering using parse tree The source and the target sentences are manually analyzed to derive the tree trans- 4

formation rules. From the generated set of rules we select rules which seem to be more generic. There are cases where more than one possible correct transformations for an English sentence can be found, as the target language (Hindi) is a free word order language within certain limits. In such cases word order close to English structure is preferred over possible word orders with respect to Hindi. There are 5 categories which are most prominent candidates for reordering. These include VPs (verb phrases), NPs (noun phrases), ADJPs (adjective phrase), PPs (preposition phrase) and ADVPs (adverb phrase). 2.2 Chunk-based reordering While translating from source languages which dont have a constituency or dependency parser, it is very difficult to reorder the source sentence to match the word order of the target language sentence. We can use shallow parsing techniques for source-side reordering. Chunk level tagging can be seen as intermediate annotations between POS tagging and parsing. The overall architecture of a translation system with chunk-based source reordering is shown in Figure 4. Figure 4: Architecture of a translation system with and without chunk-based source reordering [Zhang et al., 2007] 5

A reordering lattice is used for input to the translation system, instead of single sentence. Using lattice helps considering all possibilities of source-reordered input with their probabilistic scores. We first POS tag the input sentence and get chunklevel annotations. Then reordering rules are applied on these chunks to get a reordering lattice. Learning reordering rules: Reordering rules can be learnt from the training corpus or they can be formulated manually. Reordering rules consist of a left hand side (lhs and a right hand side rhs. lhs of a rule has chunks and POS tags, whereas, rhs has reordered positions of those chunks and POS tags. Multiple rules can have same lhs. Rules can also consist of monotone ordered chunk sequences. Figure 5 shows some examples of the reordering rules. Figure 5: Examples of reordering rules [Zhang et al., 2007] Reordering rules are extracted based on the word alignments and source chunks. We get word alignments from GIZA++ aligner. We need to convert these word-toword alignments into chunk-to-word alignments. Then, we apply phrase-extraction algorithm on these chunk-to-word alignments. We discard cross phrases. Figure 6(c) shows an example of a cross phrase. Phrases in Figure 6(a) and Figure 6(b) are accepted for reordering rules. Figure 6(a) is a monotone phrase, whereas Figure 6(b) is a reordering phrase. 6

Reordering lattice generation: Figure 6: Examples of phrase extraction [Zhang et al., 2007] After chunking the source sentence, we look for reordering rules of which lhs matches to any tag sequence of the input sentence. Thus, many paths are generated based on the rule applied. For the words uncovered by the rules, we use their POS tags. Figure 7 shows an example of application of reordering rules to a chinese sentence. Figure 7: Example of rule application [Zhang et al., 2007] Each reordering S thus generated is stored in a lattice and given a weight W. Weight is computed using source language model(p(s)). Besides word N-gram model, a POS tag N-gram model or a chunk tag N-gram model can be used as well. These 7

models can be learnt from tagged source training corpus. Thus, we studied different approaches of reordering that can be used with statistical machine translation. Summary We studied distance-based distortion reordering module and lexicalized reordering We studied limitations of distortion reordering module and studied source reordering We studied how parse tree-based source reordering model can be learnt for English We studied how chunk-based source reordering model can be learnt for Indian languages References Genzel, Dmitriy. Automatically learning source-side reordering rules for large scale machine translation. Proceedings of the 23rd international conference on computational linguistics, Association for Computational Linguistics, 2010. Koehn, Philipp, Franz Josef Och, and Daniel Marcu. Statistical phrase-based translation. Proceedings of the 2003 Conference of the North American Association for Computational Linguistics on Human Language Technology, pages 48 54, 2003. Koehn, Philipp and Hieu Hoang. Factored translation models. Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, ACL, pages 868 876, 2007. Koehn, Philipp. Statistical machine translation. Cambridge University Press, 2010 Patel, Raj Nath, Rohit Gupta, Prakash Pimpale, and Sasikumar M. Reordering rules for english-hindi smt. Proceedings of the Second Workshop on Hybrid Approaches to Translation, Association for Computational Linguistics, pages 34 41, 2013. 8

Ramanathan, Ananthakrishnan, Bhattacharyya P., Hegde J.J., Shah R.M., and Sasikumar M. Simple syntactic and morphological processing can help englishhindi statistical machine translation. Proceedings of IJCNLP, 2008. Zhang, Yuqi, Richard Zens, and Hermann Ney. Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation. Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation, Association for Computational Linguistics, 2007. 9