Factored models for phrase-based translation

Similar documents
Language Model and Grammar Extraction Variation in Machine Translation

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Noisy SMS Machine Translation in Low-Density Languages

arxiv: v1 [cs.cl] 2 Apr 2017

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

The KIT-LIMSI Translation System for WMT 2014

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Training and evaluation of POS taggers on the French MULTITAG corpus

Cross Language Information Retrieval

Prediction of Maximal Projection for Semantic Role Labeling

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Context Free Grammars. Many slides from Michael Collins

Developing a TT-MCTAG for German with an RCG-based Parser

The NICT Translation System for IWSLT 2012

CS 598 Natural Language Processing

Theoretical Syntax Winter Answers to practice problems

Parsing of part-of-speech tagged Assamese Texts

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Re-evaluating the Role of Bleu in Machine Translation Research

The stages of event extraction

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Using dialogue context to improve parsing performance in dialogue systems

Linking Task: Identifying authors and book titles in verbose queries

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Constructing Parallel Corpus from Movie Subtitles

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Words come in categories

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Annotation Projection for Discourse Connectives

THE VERB ARGUMENT BROWSER

SEMAFOR: Frame Argument Resolution with Log-Linear Models

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

The taming of the data:

Accurate Unlexicalized Parsing for Modern Hebrew

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

The Smart/Empire TIPSTER IR System

Experts Retrieval with Multiword-Enhanced Author Topic Model

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Some Principles of Automated Natural Language Information Extraction

A heuristic framework for pivot-based bilingual dictionary induction

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Natural Language Processing. George Konidaris

A Graph Based Authorship Identification Approach

Chapter 4: Valence & Agreement CSLI Publications

TINE: A Metric to Assess MT Adequacy

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Modeling full form lexica for Arabic

L1 and L2 acquisition. Holger Diessel

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Memory-based grammatical error correction

Ensemble Technique Utilization for Indonesian Dependency Parser

A Computational Evaluation of Case-Assignment Algorithms

Regression for Sentence-Level MT Evaluation with Pseudo References

Underlying and Surface Grammatical Relations in Greek consider

Construction Grammar. University of Jena.

Applications of memory-based natural language processing

Detecting English-French Cognates Using Orthographic Edit Distance

Multilingual Sentiment and Subjectivity Analysis

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Problems of the Arabic OCR: New Attitudes

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

LTAG-spinal and the Treebank

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Overview of the 3rd Workshop on Asian Translation

Interactive Corpus Annotation of Anaphor Using NLP Algorithms

An Interactive Intelligent Language Tutor Over The Internet

The Role of the Head in the Interpretation of English Deverbal Compounds

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Florida Reading Endorsement Alignment Matrix Competency 1

Multi-Lingual Text Leveling

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

The RWTH Aachen University English-German and German-English Machine Translation System for WMT 2017

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Enhancing Morphological Alignment for Translating Highly Inflected Languages

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Proof Theory for Syntacticians

Grammars & Parsing, Part 1:

Derivational and Inflectional Morphemes in Pak-Pak Language

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Discriminative Learning of Beam-Search Heuristics for Planning

Transcription:

Factored models for phrase-based translation LING 575 Lecture 7 Kristina Toutanova MSR & UW May 18, 2010 With slides mostly borrowed from Philip Koehn

Assignments Project updates due May 19 Guidelines on on web site A couple of paragraphs to one page update, not graded Paper reviews due May 26 People who did not do a paper presentation Guidelines on web page

Overview Motivation for factored models Example Model and Training Alternate Decoding Paths Decoding Applications Enriching output space Translating factored words Enriching input space

Statistical machine translation today Best performing methods based on phrases short sequences of words no use of explicit syntactic information no use of morphological information currently best performing method Progress in syntax-based translation tree transfer models using syntactic annotation still shallow representation of words and non-terminals active research, improving performance 2

One motivation: morphology 3 Models treat car and cars as completely different words training occurrences of car have no effect on learning translation of cars if we only see car, we do not know how to translate cars rich morphology (German, Arabic, Finnish, Czech,...) many word forms Better approach analyze surface word forms into lemma and morphology, e.g.: car +plural translate lemma and morphology separately generate target surface form

Factored represention of words Factored translation models 4 Input Output word lemma part-of-speech morphology word class word lemma part-of-speech morphology word class... Goals Generalization, e.g. by translating lemmas, not surface forms Richer model, e.g. using syntax for reordering, language modeling)...

5 Related work Back off to representations with richer statistics (lemma, etc.) [Nießen and Ney, 2001, Yang and Kirchhoff 2006, Talbot and Osborne 2006] Use of additional annotation in pre-processing (POS, syntax trees, etc.) [Collins et al., 2005, Crego et al, 2006] Use of additional annotation in re-ranking (morphological features, POS, syntax trees, etc.) [Och et al. 2004, Koehn and Knight, 2005] we pursue an integrated approach Use of syntactic tree structure [Wu 1997, Alshawi et al. 1998, Yamada and Knight 2001, Melamed 2004, Menezes and Quirk 2005, Chiang 2005, Galley et al. 2006] may be combined with our approach

Factored Translation Models 6 Motivation Example Model and Training Decoding Experiments

Decomposing translation: example Translate lemma and syntactic information separately 7 lemma lemma part-of-speech part-of-speech morphology morphology

Decomposing translation: example Generate surface form on target side 8 surface lemma part-of-speech morphology

9 Input: (Autos, Auto, NNS) Translation process: example 1. Translation step: lemma lemma (?, car,?), (?, auto,?) 2. Generation step: lemma part-of-speech (?, car, NN), (?, car, NNS), (?, auto, NN), (?, auto, NNS) 3. Translation step: part-of-speech part-of-speech (?, car, NN), (?, car, NNS), (?, auto, NNP), (?, auto, NNS) 4. Generation step: lemma,part-of-speech surface (car, car, NN), (cars, car, NNS), (auto, auto, NN), (autos, auto, NNS)

Factored Translation Models 10 Motivation Example Model and Training Decoding Experiments

Model 11 Extension of phrase model Mapping of foreign words into English words broken up into steps translation step: maps foreign factors into English factors (on the phrasal level) generation step: maps English factors into English factors (for each word) Each step is modeled by one or more feature functions fits nicely into log-linear model weight set by discriminative training method Order of mapping steps is chosen to optimize search

Phrase-based training Establish word alignment (GIZA++ and symmetrization) 12 natürlich hat john spass am spiel naturally john has fun with the game

Extract phrase Phrase-based training 13 natürlich hat john spass am spiel naturally john has fun with the game natürlich hat john naturally john has

Factored training Annotate training with factors, extract phrase 14 ADV V NNP NN P NN ADV NNP V NN P DET NN ADV V NNP ADV NNP V

Training of generation steps 15 Generation steps map target factors to target factors typically trained on target side of parallel corpus may be trained on additional monolingual data Example: The/det man/nn sleeps/vbz count collection - count(the,det)++ - count(man,nn)++ - count(sleeps,vbz)++ evidence for probability distributions (max. likelihood estimation) - p(det the), p(the det) - p(nn man), p(man nn) - p(vbz sleeps), p(sleeps vbz)

Model form In standard phrase-based MT we have scores of phrase-pairs score f 1 f m, e 1 e n = λ 1 P f 1 f m e 1 e n + λ 2 P e 1 e n f 1 f m + λ 3 P lex f 1... f m e 1... e n + λ 4 P lex e 1 e n f 1 f m + λ 5 Now the scores of phrase-pairs are decomposed into scores for translation and generation steps within the phrase pair Take this model:

Model form equation f j = f j lf j posf j e i = e i le i pose i me i score f 1 lf 1 posf 1 f m lf m posf m, e 1 le 1 me 1. e n le n pose n me n = score lf 1 lf m, le 1 le n + score(posf 1 posf m, posle 1 me 1 posle n me n )+ score gen e 1, posle 1 me 1 + + score gen (e n, posle n me n )

Factored Translation Models 16 Motivation Example Model and Training Decoding Experiments

Phrase-based translation Task: translate this sentence from German into English 17 er geht ja nicht nach hause

18 Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase in input, translate

19 Translation step 2 Task: translate this sentence from German into English er geht ja nicht nach hause er ja nicht he does not Pick phrase in input, translate it is allowed to pick words out of sequence (reordering) phrases may have multiple words: many-to-many translation

20 Translation step 3 Task: translate this sentence from German into English er geht ja nicht nach hause er geht ja nicht he does not go Pick phrase in input, translate

21 Translation step 4 Task: translate this sentence from German into English er geht ja nicht nach hause er geht ja nicht nach hause he does not go home Pick phrase in input, translate

Translation options 22 er geht ja nicht nach hause he it, it, he it is he will be it goes he goes is are goes go is are is after all does yes is, of course not is not are not is not a not is not does not do not not do not does not is not to following not after not to after to according to in home under house return home do not house home chamber at home Many translation options to choose from in Europarl phrase table: 2727 matching phrase pairs for this sentence by pruning to the top 20 per phrase, 202 translation options remain

Translation options 23 er geht ja nicht nach hause he it, it, he it is he will be it goes he goes is are goes go is are is after all does yes is, of course not is not are not is not a not is not does not do not not do not does not is not to following not after not to after to according to in home under house return home do not The machine translation decoder does not know the right answer Search problem solved by heuristic beam search house home chamber at home

Decoding process: precompute translation options er geht ja nicht nach hause 24

Decoding process: start with initial hypothesis er geht ja nicht nach hause 25

Decoding process: hypothesis expansion er geht ja nicht nach hause 26 are

Decoding process: hypothesis expansion er geht ja nicht nach hause 27 he are it

Decoding process: hypothesis expansion er geht ja nicht nach hause 28 yes he goes home are does not go home it to

Decoding process: find best path er geht ja nicht nach hause 29 yes he goes home are does not go home it to

Factored model decoding 30 Factored model decoding introduces additional complexity Hypothesis expansion not any more according to simple translation table, but by executing a number of mapping steps, e.g.: 1. translating of lemma lemma 2. translating of part-of-speech, morphology part-of-speech, morphology 3. generation of surface form Example: haus NN neutral plural nominative { houses house NN plural, homes home NN plural, buildings building NN plural, shells shell NN plural } Each time, a hypothesis is expanded, these mapping steps have to applied

Efficient factored model decoding Key insight: executing of mapping steps can be pre-computed and stored as translation options apply mapping steps to all input phrases store results as translation options decoding algorithm unchanged 31... haus NN neutral plural nominative............... houses house NN plural homes home NN plural buildings building NN plural shells shell NN plural........................

Efficient factored model decoding 32 Problem: Explosion of translation options originally limited to 20 per input phrase even with simple model, now 1000s of mapping expansions possible Solution: Additional pruning of translation options keep only the best expanded translation options current default 50 per input phrase decoding only about 2-3 times slower than with surface model

Factored Translation Models 33 Motivation Example Model and Training Decoding Experiments

Adding linguistic markup to output 34 Input Output word word part-of-speech Generation of POS tags on the target side Use of high order language models over POS (7-gram, 9-gram) Motivation: syntactic tags should enforce syntactic sentence structure model not strong enough to support major restructuring

Some experiments English German, Europarl, 30 million word, test2006 Model BLEU best published result 18.15 baseline (surface) 18.04 surface + POS 18.15 German English, News Commentary data (WMT 2007), 1 million word Model BLEU Baseline 18.19 With POS LM 19.05 Improvements under sparse data conditions Similar results with CCG supertags [Birch et al., 2007] 35

Sequence models over morphological tags 36 die hellen Sterne erleuchten das schwarze Himmel (the) (bright) (stars) (illuminate) (the) (black) (sky) fem fem fem - neutral neutral male plural plural plural plural sgl. sgl. sgl nom. nom. nom. - acc. acc. acc. Violation of noun phrase agreement in gender das schwarze and schwarze Himmel are perfectly fine bigrams but: das schwarze Himmel is not If relevant n-grams does not occur in the corpus, a lexical n-gram model would fail to detect this mistake Morphological sequence model: p(n-male J-male) > p(n-male J-neutral)

Local agreement (esp. within noun phrases) 37 Input Output word word part-of-speech morphology High order language models over POS and morphology Motivation DET-sgl NOUN-sgl good sequence DET-sgl NOUN-plural bad sequence

38 Agreement within noun phrases Experiment: 7-gram POS, morph LM in addition to 3-gram word LM Results Method Agreement errors in NP devtest test baseline 15% in NP 3 words 18.22 BLEU 18.04 BLEU factored model 4% in NP 3 words 18.25 BLEU 18.22 BLEU Example baseline:... zur zwischenstaatlichen methoden... factored model:... zu zwischenstaatlichen methoden... Example baseline:... das zweite wichtige änderung... factored model:... die zweite wichtige änderung...

Other result on enriching output [Koehn and Hoang 07] 40K training sent 20K training sent

Morphological generation model 39 Input Output word word lemma lemma part-of-speech part-of-speech morphology Our motivating example Translating lemma and morphological information more robust

40 Initial results Results on 1 million word News Commentary corpus (German English) System In-doman Out-of-domain Baseline 18.19 15.01 With POS LM 19.05 15.03 Morphgen model 14.38 11.65 What went wrong? why back-off to lemma, when we know how to translate surface forms? loss of information

Solution: alternative decoding paths 41 Input Output word lemma or word lemma part-of-speech part-of-speech morphology Allow both surface form translation and morphgen model prefer surface model for known words morphgen model acts as back-off

42 Model now beats the baseline: Results System In-doman Out-of-domain Baseline 18.19 15.01 With POS LM 19.05 15.03 Morphgen model 14.38 11.65 Both model paths 19.47 15.23

Specifying factored models in Moses: Example train-factored-phrase-model.perl --corpus factored-corpus/projsyndicate.1000 \ --root-dir pos-decomposed \ --f de --e en \ --lm 0:3:factored-corpus/surface.lm:0 \ --lm 1:3:factored-corpus/pos.lm:0 \ --translation-factors 0-0 \ --generation-factors 0-1 \ --decoding-steps t0,g0

Specifying factored models in Moses: Example train-factored-phrase-model.perl --f de --e en \ --lm 0:3:factored-corpus/surface.lm:0 \ --lm 2:3:factored-corpus/pos.lm:0 \ --translation-factors 1-1+2-2,3\ --generation-factors 1,2,3-0 \ --decoding-steps t0,t1,g0 \

Specifying factored models in Moses: multiple decoding paths train-factored-phrase-model.perl --f de --e en \ --lm 0:3:factored-corpus/surface.lm:0 \ --lm 2:3:factored-corpus/pos.lm:0 \ --translation-factors 1-1+2-2,3+0-0,2\ --generation-factors 1,2,3-0 \ --decoding-steps t0,t1,g0:t2 \

Adding annotation to the source Source words may lack sufficient information to map phrases English-German: what case for noun phrases? Chinese-English: plural or singular pronoun translation: what do they refer to? Idea: add additional information to the source that makes the required information available locally (where it is needed) see [Avramidis and Koehn, ACL 2008] for details 43

Error analysis for an English-Greek baseline phrasal system

Case Information for English Greek 44 Input Output word word subject/object case Detect in English, if noun phrase is subject/object (using parse tree) Map information into case morphology of Greek Use case morphology to generate correct word form

Obtaining Case Information Use syntactic parse of English input (method similar to semantic role labeling) 45

46 Results English-Greek Automatic BLEU scores System devtest test07 baseline 18.13 18.05 enriched 18.21 18.20 Improvement in verb inflection System Verb count Errors Missing baseline 311 19.0% 7.4% enriched 294 5.4% 2.7% Improvement in noun phrase inflection System NPs Errors Missing baseline 247 8.1% 3.2% enriched 239 5.0% 5.0% Also successfully applied to English-Czech

Summary Factored translation models make it possible to model words as a set of features (factors) We can use this to build pos-based language models for the target Good empirical improvements with 7-gram LMs over output syntactic factors We can use this to represent translation of phrases as translation of parts of words in the phrases e.g. lemma/morphology Using multiple decoding paths we can avoid the strong independence assumptions Good empirical improvement in small/medium data conditions We can enrich the word representation of an input language to aid translation into a morphologically richer language Good improvements on specific linguistic phenomena, not a huge boost to overall BLEU

References Factored Translation Models, Philipp Koehn and Hieu Hoang, EMNLP 2007, pdf. Enriching Morphologically Poor Languages for Statistical Machine Translation, Eleftherios Avramidis and Philipp Koehn, ACL 2008, pdf.