Factored SMT Models. Q.Q June 3, 2014

Similar documents
The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Language Model and Grammar Extraction Variation in Machine Translation

Noisy SMS Machine Translation in Low-Density Languages

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

The NICT Translation System for IWSLT 2012

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

arxiv: v1 [cs.cl] 2 Apr 2017

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Training and evaluation of POS taggers on the French MULTITAG corpus

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

CS 598 Natural Language Processing

The KIT-LIMSI Translation System for WMT 2014

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Prediction of Maximal Projection for Semantic Role Labeling

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Linking Task: Identifying authors and book titles in verbose queries

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Constructing Parallel Corpus from Movie Subtitles

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Developing a TT-MCTAG for German with an RCG-based Parser

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

BULATS A2 WORDLIST 2

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

Memory-based grammatical error correction

A Graph Based Authorship Identification Approach

Cross Language Information Retrieval

Re-evaluating the Role of Bleu in Machine Translation Research

Overview of the 3rd Workshop on Asian Translation

The stages of event extraction

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Speech Emotion Recognition Using Support Vector Machine

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Learning Methods in Multilingual Speech Recognition

SEMAFOR: Frame Argument Resolution with Log-Linear Models

THE VERB ARGUMENT BROWSER

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

TINE: A Metric to Assess MT Adequacy

Probabilistic Latent Semantic Analysis

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Detecting English-French Cognates Using Orthographic Edit Distance

Words come in categories

Problems of the Arabic OCR: New Attitudes

The RWTH Aachen University English-German and German-English Machine Translation System for WMT 2017

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Cross-lingual Text Fragment Alignment using Divergence from Randomness

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

The Role of the Head in the Interpretation of English Deverbal Compounds

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Initial approaches on Cross-Lingual Information Retrieval using Statistical Machine Translation on User Queries

The taming of the data:

Multilingual Sentiment and Subjectivity Analysis

A deep architecture for non-projective dependency parsing

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

The Smart/Empire TIPSTER IR System

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Beyond the Pipeline: Discrete Optimization in NLP

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Using dialogue context to improve parsing performance in dialogue systems

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

Development of the First LRs for Macedonian: Current Projects

Multiple case assignment and the English pseudo-passive *

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Parsing of part-of-speech tagged Assamese Texts

MYCIN. The embodiment of all the clichés of what expert systems are. (Newell)

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

A heuristic framework for pivot-based bilingual dictionary induction

Word Translation Disambiguation without Parallel Texts

Enhancing Morphological Alignment for Translating Highly Inflected Languages

Cross-Lingual Text Categorization

Lecture 1: Machine Learning Basics

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Leveraging Sentiment to Compute Word Similarity

Regression for Sentence-Level MT Evaluation with Pseudo References

Disambiguation of Thai Personal Name from Online News Articles

Deep Neural Network Language Models

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Human Emotion Recognition From Speech

Modeling full form lexica for Arabic

Performance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database

Rule Learning With Negation: Issues Regarding Effectiveness

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Transcription:

Factored SMT Models Q.Q June 3, 2014

Standard phrase-based models Limitations of phrase-based models: No explicit use of linguistic information

Word = Token Words in different forms are treated independent of each other. Unknown words cannot be translated, especially in morhologically rich languages. ex: eat, eating, ate, eaten

Integration of linguistic information into the translation model: Draw on richer statistics Overcome data sparseness problems Direct modeling of linguistic aspects Reordering in translation result

Word = Vector Input Output Word Word Lemma Lemma POS POS Morphology Morphology Word class Word class...

Factored translation model Input Output Word Word Lemma Lemma POS POS Morphology Morphology Word class Word class

Decomposition Translate input lemma to output lemma Translate morphological and POS factors Generate surface forms given the lemma and linguistic factors

neue häuser werden gebaut new houses are built Surface-form häuser Lemma haus POS NN Count plural Case nominative Gender neutral

neue häuser werden gebaut new houses are built Input phrase expansion Translate input lemma to output lemma haus house, home, building, shell Translate morphological and POS factors NN plural-nominative-neutral NN plural, NN singular Generate surface forms given the lemma and linguistic factors house NN plural houses house NN singular house home NN plural homes

neue häuser werden gebaut new houses are built häuser haus NN plural-nominative-neutral List of translation options Translate input lemma to output lemma {? house??,? home??,? building??,? shell??} Translate morphological and POS factors {? house NN plural,? home NN plural,? building NN plural,? shell NN plural,? house NN singular,... } Generate surface forms given the lemma and linguistic factors {houses house NN plural, homes home NN plural, buildings building NN plural, shells shell NN plural, house house NN singular,... }

Synchronous factored models Translation steps: on the phrase level Generation steps: on the word level

Training Prepare on training data (automatic tools on the corpus to add information) Establish word alignment (symmetrized GIZA++ alignments) Map steps to form components of the overall model Extract phrase pairs that are consistent with the word alignment Estimate scoring functions (conditional phrase translation probabilities or lexical translation probabilities)

Word alignment

Extract phrase natürlich hat john # naturally john has

Extract phrase for other factors ADV V NNP # ADV NNP V

Training the generation model On the output side only: No word alignment Additional monolingual data may be used Learn on a word-for-word basis

Map factor(s) to factor(s) Example: word POS and POS word The/DET big/adj tree/nn Count collection: count( the, DET )++ count( big, ADJ )++ count( tree, NN )++ Probability distributions (maximum likelihood estimates) p( the DET ) and p( DET the ) p( big ADJ ) and p( ADJ big ) p( tree NN ) and p( NN tree )

Combination of components Language model Reordering model Translation steps Generation steps

Efficient decoding Mapping steps additional complexity Single table multiple tables

Pre-computation Prior to the heuristic beam search: The expansions of mapping steps can be pre-computed can be stored as translation options All possible translation options are computed before decoding. No change to fundamental search algorithm

Beam search Empty hypothesis New hypothesis by using all applicable translation options Generate further hypothesis in the same manner Cover the full input sentence Highest scoring complete hypothesis = Best translation according to the model

Problem Too many translation options to handle caused by a vast increase of expansions by one or more mapping steps

Current solution Early pruning of expansions Limitation on the number of translation options per input phrase (max: 50)

Experiments and results Moses system http://www.statmt.org/moses/

Syntactically enriched output Input Output Word Tri-gram Word 7-gram POS

Syntactically enriched output Model BLEU English - German Europarl, 30 million words, 2006 best published result 18.15% baseline (surface) 18.04% surface + POS 18.15% surface + POS + morph 18.22%

Morphological analysis and generation Input Output Word Word Lemma Lemma POS POS Morphology Morphology

Morphological analysis and generation German - English News Commentary data, 1 million words, 2007 Model BLEU baseline (surface) 18.19% + POS LM 19.05% pure lemma / morph model 14.46% backoff lemma / morph model 19.47%

Use of automatic word classes Input Output Word Tri-gram Word 7-gram Word class

Use of automatic word classes English - Chinese IWSLT, 39953 sentences, 2006 Model BLEU baseline (surface) 19.54% surface + word class 21.10%

Integrated recasing Input Output Lower-cased Lower-cased Mixed-cased

Integrated recasing Chinese - English IWSLT, 39953 sentences, 2006 Model standard two-pass: SMT + recase BLEU 20.65% integrated factored model (optimized) 21.08%

References P. Koehn and H. Hoang, "Factored translation models", Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP- CoNLL), vol. 868, p. 876, 2007. P. Koehn, Statistical Machine Translation, Cambridge University Press, UK, pp. 127-130, 2010. P. Porkaew, A. Takhom and T. Supnithi, "Factored Translation Model in English-to-Thai Translation", Eighth International Symposium on Natural Language Processing, 2009. S. Li, D. Wong and L. Chao, "Korean-Chinese statistical translation model", Proceedings of the 2012 International Conference on Machine Learning and Cybernetics, Xian, 2012.