Factored SMT Models Q.Q June 3, 2014
Standard phrase-based models Limitations of phrase-based models: No explicit use of linguistic information
Word = Token Words in different forms are treated independent of each other. Unknown words cannot be translated, especially in morhologically rich languages. ex: eat, eating, ate, eaten
Integration of linguistic information into the translation model: Draw on richer statistics Overcome data sparseness problems Direct modeling of linguistic aspects Reordering in translation result
Word = Vector Input Output Word Word Lemma Lemma POS POS Morphology Morphology Word class Word class...
Factored translation model Input Output Word Word Lemma Lemma POS POS Morphology Morphology Word class Word class
Decomposition Translate input lemma to output lemma Translate morphological and POS factors Generate surface forms given the lemma and linguistic factors
neue häuser werden gebaut new houses are built Surface-form häuser Lemma haus POS NN Count plural Case nominative Gender neutral
neue häuser werden gebaut new houses are built Input phrase expansion Translate input lemma to output lemma haus house, home, building, shell Translate morphological and POS factors NN plural-nominative-neutral NN plural, NN singular Generate surface forms given the lemma and linguistic factors house NN plural houses house NN singular house home NN plural homes
neue häuser werden gebaut new houses are built häuser haus NN plural-nominative-neutral List of translation options Translate input lemma to output lemma {? house??,? home??,? building??,? shell??} Translate morphological and POS factors {? house NN plural,? home NN plural,? building NN plural,? shell NN plural,? house NN singular,... } Generate surface forms given the lemma and linguistic factors {houses house NN plural, homes home NN plural, buildings building NN plural, shells shell NN plural, house house NN singular,... }
Synchronous factored models Translation steps: on the phrase level Generation steps: on the word level
Training Prepare on training data (automatic tools on the corpus to add information) Establish word alignment (symmetrized GIZA++ alignments) Map steps to form components of the overall model Extract phrase pairs that are consistent with the word alignment Estimate scoring functions (conditional phrase translation probabilities or lexical translation probabilities)
Word alignment
Extract phrase natürlich hat john # naturally john has
Extract phrase for other factors ADV V NNP # ADV NNP V
Training the generation model On the output side only: No word alignment Additional monolingual data may be used Learn on a word-for-word basis
Map factor(s) to factor(s) Example: word POS and POS word The/DET big/adj tree/nn Count collection: count( the, DET )++ count( big, ADJ )++ count( tree, NN )++ Probability distributions (maximum likelihood estimates) p( the DET ) and p( DET the ) p( big ADJ ) and p( ADJ big ) p( tree NN ) and p( NN tree )
Combination of components Language model Reordering model Translation steps Generation steps
Efficient decoding Mapping steps additional complexity Single table multiple tables
Pre-computation Prior to the heuristic beam search: The expansions of mapping steps can be pre-computed can be stored as translation options All possible translation options are computed before decoding. No change to fundamental search algorithm
Beam search Empty hypothesis New hypothesis by using all applicable translation options Generate further hypothesis in the same manner Cover the full input sentence Highest scoring complete hypothesis = Best translation according to the model
Problem Too many translation options to handle caused by a vast increase of expansions by one or more mapping steps
Current solution Early pruning of expansions Limitation on the number of translation options per input phrase (max: 50)
Experiments and results Moses system http://www.statmt.org/moses/
Syntactically enriched output Input Output Word Tri-gram Word 7-gram POS
Syntactically enriched output Model BLEU English - German Europarl, 30 million words, 2006 best published result 18.15% baseline (surface) 18.04% surface + POS 18.15% surface + POS + morph 18.22%
Morphological analysis and generation Input Output Word Word Lemma Lemma POS POS Morphology Morphology
Morphological analysis and generation German - English News Commentary data, 1 million words, 2007 Model BLEU baseline (surface) 18.19% + POS LM 19.05% pure lemma / morph model 14.46% backoff lemma / morph model 19.47%
Use of automatic word classes Input Output Word Tri-gram Word 7-gram Word class
Use of automatic word classes English - Chinese IWSLT, 39953 sentences, 2006 Model BLEU baseline (surface) 19.54% surface + word class 21.10%
Integrated recasing Input Output Lower-cased Lower-cased Mixed-cased
Integrated recasing Chinese - English IWSLT, 39953 sentences, 2006 Model standard two-pass: SMT + recase BLEU 20.65% integrated factored model (optimized) 21.08%
References P. Koehn and H. Hoang, "Factored translation models", Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP- CoNLL), vol. 868, p. 876, 2007. P. Koehn, Statistical Machine Translation, Cambridge University Press, UK, pp. 127-130, 2010. P. Porkaew, A. Takhom and T. Supnithi, "Factored Translation Model in English-to-Thai Translation", Eighth International Symposium on Natural Language Processing, 2009. S. Li, D. Wong and L. Chao, "Korean-Chinese statistical translation model", Proceedings of the 2012 International Conference on Machine Learning and Cybernetics, Xian, 2012.