Outline. Factored translation models. Outline. Factored translation models N-gram-based translation models Hiero Syntax-based translation systems

Similar documents
Context Free Grammars. Many slides from Michael Collins

Training and evaluation of POS taggers on the French MULTITAG corpus

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

CS 598 Natural Language Processing

Language Model and Grammar Extraction Variation in Machine Translation

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

The KIT-LIMSI Translation System for WMT 2014

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Noisy SMS Machine Translation in Low-Density Languages

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

arxiv: v1 [cs.cl] 2 Apr 2017

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Grammars & Parsing, Part 1:

Prediction of Maximal Projection for Semantic Role Labeling

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Detecting English-French Cognates Using Orthographic Edit Distance

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Developing a TT-MCTAG for German with an RCG-based Parser

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Overview of the 3rd Workshop on Asian Translation

Natural Language Processing. George Konidaris

Accurate Unlexicalized Parsing for Modern Hebrew

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

The NICT Translation System for IWSLT 2012

An Introduction to the Minimalist Program

Cross Language Information Retrieval

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Linking Task: Identifying authors and book titles in verbose queries

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Parsing of part-of-speech tagged Assamese Texts

Proof Theory for Syntacticians

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Applications of memory-based natural language processing

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Learning Methods in Multilingual Speech Recognition

SEMAFOR: Frame Argument Resolution with Log-Linear Models

LTAG-spinal and the Treebank

What is NLP? CS 188: Artificial Intelligence Spring Why is Language Hard? The Big Open Problems. Information Extraction. Machine Translation

Annotation Projection for Discourse Connectives

LING 329 : MORPHOLOGY

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Control and Boundedness

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Language Independent Passage Retrieval for Question Answering

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

The Strong Minimalist Thesis and Bounded Optimality

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Ensemble Technique Utilization for Indonesian Dependency Parser

The stages of event extraction

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

The Smart/Empire TIPSTER IR System

Adding syntactic structure to bilingual terminology for improved domain adaptation

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

The taming of the data:

Constraining X-Bar: Theta Theory

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Using dialogue context to improve parsing performance in dialogue systems

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Chapter 4: Valence & Agreement CSLI Publications

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Constructing Parallel Corpus from Movie Subtitles

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Modeling full form lexica for Arabic

Beyond the Pipeline: Discrete Optimization in NLP

Underlying and Surface Grammatical Relations in Greek consider

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Speech Recognition at ICSI: Broadcast News and beyond

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

M55205-Mastering Microsoft Project 2016

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

AQUA: An Ontology-Driven Question Answering System

Some Principles of Automated Natural Language Information Extraction

The Interface between Phrasal and Functional Constraints

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

TINE: A Metric to Assess MT Adequacy

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Character Stream Parsing of Mixed-lingual Text

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Vocabulary Usage and Intelligibility in Learner Language

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Transcription:

Maxim Khalilov TAU Labs Amsterdam Marta R. Costa-jussà Barcelona Media Barcelona Outline Factored translation models N-gram-based translation models yntax-based translation systems RuIR 2012 August 5-10, 2012 3 of 61 Outline Factored translation models N-gram-based translation models yntax-based translation systems Factored translation models Factored translation models are an extension to phrasebased models where every word is substituted by a vector of factors. (word) = (word, lemma, Po, morphology,...) The translation is now a combination of pure translation (T) and generation (G) steps: lemma f Po f morphology f word f T T T G lemma e Po e morphology e word e 2 of 61 4 of 61

Factored translation models What differs in factored translation models (as compared to standard phrase-based models) The parallel corpus must be annotated beforehand. Extra language models for every factor can also be used. Translation steps are accomplished in a similar way. Generation steps imply a training only on the target side of the corpus. Models corresponding to the different factors and components are combined in a log-linear fashion. Factored translation models English German Model BLEU best published result 18.15% baseline (surface) 18.04% surface + PO 18.15% surface + PO + morph 18.22% English panish Model BLEU baseline (surface) 23.41% surface + morph 24.66% surface + PO + morph 24.25% English Czech Model BLEU baseline (surface) 25.82% surface + all morph 27.04% surface + case/number/gender 27.45% surface + CNG/verb/prepositions 27.62% 5 of 61 7 of 61 Factored translation models Outline Factored Representation Input Output word lemma word lemma Factored Model: transfer and generation word lemma Input Output word lemma Factored translation models N-gram-based translation models yntax-based translation systems PO PO PO PO morphology morphology morphology morphology word class word class word class word class 6 of 61 8 of 61

N-gram-based translation models N-gram-based translation models Log-linear combination of feature functions: { M } ˆt I 1 = arg max t I 1 m=1 λ m h m (s J 1,t I 1) Bilingual N-gram translation language model target language model word bonus model source target lexicon model (ibm1 probs.) target source lexicon model (ibm1 probs.) target PO language model 9 of 61 Tuples are extracted from word alignment A unique, monotonous segmentation of each sentence pair is produced. No word in a tuple is aligned to words outside of it No smaller tuples can be extracted without violating the previous constraints 11 of 61 N-gram-based translation models N-gram-based translation models h BM (s J 1,t I 1) = log K p((s,t) i (s,t) i N+1,...,(s,t) i 1 ) i=1 Given a word alignment, tuples T i = (s,t) i are those bilingual units: having minimal length describing a monotonic segmentation of each sentence pair K h BM (s J 1,t I 1) = log p((s,t) i (s,t) i N+1,...,(s,t) i 1 ) i=1 Given a word alignment, tuples T i = (s,t) i are those bilingual units: having minimal length describing a monotonic segmentation of each sentence pair Constraints define a unique possible segmentation Except: NULL-source tuples Constraints define a unique possible segmentation Except: NULL-source tuples 10 of 61 12 of 61

N-gram-based translation models Feature functions: target language model: h TM (s,t) = h TM (t) = log K p(w k w k N+1,...,w k 1 ) k=1 word bonus model: h WB (s,t) = h WB (t) = K source target lexicon model: h LE (s,t) = log 1 (I + 1) J J j=1 i=0 target source lexicon model (analogous) 13 of 61 I p IBM1 (t n j s n i ) N-gram-based translation models WMT 09 (large corpus, Es<->En): panish-to-english English-to-panish ystem BLEU Constr. Rank. ystem BLEU Constr. Rank. GOOGLE 0.29 NO.70 GOOGLE 0.28 NO.65 UEDIN 0.26 YE.56 NU 0.25 YE.59 UPC-TALP 0.26 (2-3) YE.59 (2) UEDIN 0.25 YE.66 NICT 0.22 YE.37 UPC-TALP 0.25 (2-4) YE.58 (5) RBMT 0.20 NO.55 RBMT 0.22 NO.64 AAR 0.20 NO.51 RWTH 0.22 YE.51 - - - - AAR 0.20 NO.48 IWLT 08 (small amount of data, Zh<->En): ystem Zh2Es Zh2(En)2Es Clean AR Clean AR ystem BLEU BLEU Rank. BLEU BLEU Rank. TCH 34.57 30.52 47.73 TCH 40.42 35.43 49.32 FBK 29.60 24.24 33.42 FBK 39.41 32.51 39.90 DCU 27.10 23.89 28.99 UPC-TALP 38.09 (3) 32.51 (3-4) 39.01 (3) TTK 26.62 24.40 28.99 NICT 37.11 32.81 30.88 NICT 26.41 23.31 29.79 DCU 32.42 28.47 31.72 PT 25.72 20.10 19.77 TTK 31.88 28.15 34.16 UPC-TALP 25.65 (7) 22.14 (6) 26.42 (6) GREYC 15.80 15.05 15.46 GREYC 19.70 18.91 15.46 QMUL 2.87 11.59 17.72 15 of 61 N-gram-based translation models N-gram-based translation models Decoding: freely available MARIE decoder [Crego et al., 2005] (beam search with hypothesis recombination, threshold and histogram pruning) no rescoring module (1-best output used) monotone and reordered search José B. Mariño, Rafael Banchs, Josep M. Crego, Adrià de Gispert, Patrik Lambert, José A. R. Fonollosa, Marta R. Costa-jussà. N-gram-based Machine Translation. Computational Linguistics, 2006 Feature function weights optimization: Downhill implex Method 14 of 61 16 of 61

Outline Factored translation models N-gram-based translation models yntax-based translation systems A Chinese sentence: Why the difference in word order? Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi Australia is with North Korea have dipl. rels. that few The English translation: countries one of Australia is one of the few countries that have diplomatic relations with North Korea shaoshuguojiazhiyi oneofthefewcountries 17 of 61 19 of 61 A Motivating Example from Chiang 2007 A Chinese sentence: Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi Why the difference in word order? A Chinese sentence: Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi Australia is with North Korea have dipl. rels. that few countries one of Australia is with North Korea have dipl. rels. that few countries one of The English translation: Australia is one of the few countries that have diplomatic relations with North Korea The English translation: Australia is one of the few countries that have diplomatic relations with North Korea shaoshuguojiazhiyi oneofthefewcountries yubeihan youbangjiao de that have diplomatic relations with North Korea 18 of 61 20 of 61

A Chinese sentence: Why the difference in word order? Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi A Chinese sentence: Why the difference in word order? Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi Australia is with North Korea have dipl. rels. that few countries one of Australia is with North Korea have dipl. rels. that few countries one of The English translation: Australia is one of the few countries that have diplomatic relations with North Korea shaoshu guojia zhiyi one of the few countries yubeihan youbangjiao de that have diplomatic relations with North Korea The English translation: Australia is one of the few countries that have diplomatic relations with North Korea shaoshu guojia zhiyi one of the few countries yubeihan youbangjiao de that have diplomatic relations with North Korea 21 of 61 23 of 61 A Motivating Example from Chiang 2007 A Chinese sentence: Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi A olution: Hierarchical Phrases A Chinese sentence: Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi Australia is with North Korea have dipl. rels. that few countries one of Australia is with North Korea have dipl. rels. that few countries one of The English translation: Australia is one of the few countries that have diplomatic relations with North Korea Output from a phrase-based system: [Aozhou] [shi] 1 [yubeihan] 2 [you] [bangjiao] [deshaoshuguojiazhiyi] [Australia] [has] [dipl.rels.] [withnorthkorea] 2 [is] 1 [oneofthefewcountries] Hierarchical phrases needed for this example: yu 1 you 2,have 2 with 1 1 de 2,the 2 that 1 1 zhiyi,oneof 1 (We ll see how to formalize this next.) 22 of 61 24 of 61

Examples of s-cfg Rules(from Chiang 2007) yu 1 you 2,have 2 with 1 1 de 2,the 2 that 1 1 zhiyi,oneof 1 An invalid s-cfg rule: Examples of s-cfg Rules V P PP 1 you NP 2,have NP 2 1 This rule is invalid because a PP corresponds to an. Nonterminals that correspond to each other must be the same. Note:theserulesmakeuseofasinglenon-terminal, We use subscripts such as 1, 2 to specify which non-terminals correspond to each other. 25 of 61 27 of 61 Another valid s-cfg rule: Examples of s-cfg Rules V P PP 1 you NP 2,have NP 2 PP 1 In this case three non-terminals, NP, PP, and V P are used. The above rule is perfectly valid in an s-cfg. However, Chiang s grammar only makes use of two non-terminals: and. Intuition Behind Translation with an s-cfg Firststep:wecanreadoffaCFGforChinesefromthes-CFG, andparsethechinesewiththiscfg For example, yu 1 you 2,have 2 with 1 implies the Chinese-only context-free rule and yu you bangjiao,diplomaticrelations implies the Chinese-only context-free rule bangjiao 26 of 61 28 of 61

Intuition Behind Translation with an s-cfg The resulting CFG for Chinese: yu you de zhiyi Aozhu Beihan shi bangjiao shaoshuguojia Intuition Behind Translation with an s-cfg Firststep:wecanreadoffaCFGforChinesefromthes-CFG, andparsethechinesewiththiscfg econdstep:weusethesynchronousrulestomapthechinese parsetreetoanenglishparsetree 29 of 61 31 of 61 Aparsetreeforourexample: tart bottom-up. For example, Aozhu, Australia gives: shi zhiyi shi zhiyi Aozhou Australia de de yu you shaoshu guojia yu you shaoshu guojia Beihan bangjiao Beihan bangjiao 30 of 61 32 of 61

The tree after all the lowest-level rules are applied: Use 1 de 2,the 2 that 1 toget: Australia is zhiyi Australia is zhiyi de yu you few countries the that N. Korea dipl. rels. few countries have with 33 of 61 35 of 61 Next, apply higher-level rules. For example, use yu 1 you 2,have 2 with 1 toget: Use 1 zhiyi,oneof 1 toget: Australia is one of Australia is zhiyi the that few countries have with de dipl. rels. N. Korea have with few countries dipl. rels. N. Korea 34 of 61 36 of 61

What is missing here, but can be found in the paper: Derivation in CFG Learning a CFG grammar Probability calculation But, here are some results: Results from Chiang 2007 Outline Factored translation models N-gram-based translation models yntax-based translation systems MT03 MT04 MT05 ATA 30.84 31.74 30.50 33.72 34.57 31.79 Results are for translation from Chinese to English. MT03, MT04,andMT05are3differenttestsets.AllscoresareBleu scores. 37 of 61 39 of 61 yntax-based MT Two camps: yntax will improve translation (K. Knight) impler data-driven models will always win (F. Och) Joshua: open-source toolkit for parsing-based MT 38 of 61 40 of 61

yntax-based MT NP E There MD will AU be JJR more ADJP VP NN divisiveness ADJP IN than JJ positive NN effects Elle aura de les effets plus destructifs que positifs VP PP Fox (2002) NP Gloss: It will have effects more destructive than positive Phrases are not coherent in bitexts 41 of 61 yntax-based MT WHY???? 43 of 61 yntax-based MT 28.0 25.5 23.0 IBM Model 4 20.5 PBMT PBMT w/syntactic phrases 18.0 10k 20k 40k 80k 160k 320k Koehn et al (2003) yntax-based MT WHAT HOULD WE DO???? 42 of 61 44 of 61

yntax-based MT yntactic translation models incorporate syntax to the source and/or target languages. yntax-based MT Example of a string-to-tree translation system: interlingua yntactic phrase-based based on tree trasducers: foreign semantics foreign syntax english semantics english syntax Tree-to-string. Build mappings from target parse trees to source strings. tring-to-tree. Build mappings from target strings to source parse trees. Tree-to-tree. Mappings from parse trees to parse trees. foreign words english words Use of English syntax trees[yamada and Knight, 2001] exploit rich resources on the English side obtained with statistical parser[collins, 1997] flattened tree to allow more reorderings works well with syntactic language model 45 of 61 47 of 61 yntax-based MT Advantages of syntax-based translation: yntax-based MT Example of a string-to-tree translation system: Reordering for syntactic reasons PRP VB VB1 VB2 reorder PRP VB VB2 VB1 e.g., move German object to end of sentence he adores VB TO he TO VB adores Better explanation for function words listening TO MN to music MN music TO to listening e.g., prepositions, determiners Conditioning to syntactically related words VB PRP VB2 VB1 he ha TO VB ga adores desu VB insert PRP VB2 VB1 kare ha TO VB ga daisuki desu translation of verb may depend on subject or object MN music TO to listening no translate MN ongaku TO wo kiku no Use of syntactic language models take leaves Kare ha ongaku wo kiku no ga daisuki desu 46 of 61 48 of 61

yntax-based MT yntax-based MT Decoding as parsing: Original Order Reordering p(reorder original) PRP VB1 VB2 PRP VB1 VB2 0.074 PRPVB1VB2 PRPVB2VB1 0.723 PRP VB1 VB2 VB1 PRP VB2 0.061 PRP VB1 VB2 VB1 VB2 PRP 0.037 PRP VB1 VB2 VB2 PRP VB1 0.083 PRP VB1 VB2 VB2 VB1 PRP 0.021 VB TO VB TO 0.107 VB TO TO VB 0.893 Chart Parsing PRP NN TO he music to kare ha ongaku wo kiku no ga daisuki desu Pick Japanese words Translate into tree stumps TO NN TO NN 0.251 TO NN NN TO 0.749 49 of 61 51 of 61 yntax-based MT Decoding as parsing: yntax-based MT Decoding as parsing: Chart Parsing PRP he kare ha ongaku wo kiku no ga daisuki desu PP PRP NN TO he music to kare ha ongaku wo kiku no ga daisuki desu Pick Japanese words Adding some more entries... Translate into tree stumps 50 of 61 52 of 61

yntax-based MT Decoding as parsing: yntax-based MT Decoding as parsing: PP PRP NN TO VB he music to listening kare ha ongaku wo kiku no ga daisuki desu Combine entries VB2 PP PRP NN TO VB VB1 he music to listening adores kare ha ongaku wo kiku no ga daisuki desu 53 of 61 55 of 61 yntax-based MT Decoding as parsing: yntax-based MT Decoding as parsing: VB VB2 VB2 PP PRP NN TO VB he music to listening kare ha ongaku wo kiku no ga daisuki desu PP PRP NN TO VB VB1 he music to listening adores kare ha ongaku wo kiku no ga daisuki desu Finished when all foreign words covered 54 of 61 56 of 61

yntax-based MT How realistic is this model? Do English trees match foreign strings? Crossings between French-English[Fox, 2002] 0.29-6.27 per sentence, depending on how it is measured Canbereducedby flattening tree, as done by[yamada and Knight, 2001] detecting phrasal translation special treatment for small number of constructions Most coherence between dependency structures yntax-based MT Other syntax-based systems: U.Alberta (Microsoft): treelet translation Translating from English Using dependency parser in English Project dependency treeinto language for training Map parts of the dependency tree ( treelets ) into foreign language Reranking phrase-based MT output with syntactic features Create n-best lists with phrase-based MT PO-tag and parse candidate translation Rerank with sybtactic features 57 of 61 59 of 61 yntax-based MT Other syntax-based systems: yntax-aided phrase-based MT (Koehn, 2005) tick with phrase-based systems pecial treatment for special syntactic problems (NP treatment, clause restructuring) II: extended work of Yamada and Knight More complex rules Perfromance approaching phrase-based Prague: translation via dependency structures Parallel Czech-English treebank Tecto-grammatical translation model yntax-based MT o, syntax: does it help? Not yet best systems still phrase-based, treat words as tokens Well,maybe... work on reordering German automatically trained tree transfer systems promising Whynotyet? ifrealsyntax,weneedgoodparsers aretheygoodenough? syntactic annotations add a level of complexity difficult to handle, slow to train and decode few researchers good at statistical modeling and understand syntactic theories 58 of 61 60 of 61

Next session Practical workshop: let s make our hands dirty!!! 61 of 61