Machine Learning in Statistical Machine Translation

Similar documents
Language Model and Grammar Extraction Variation in Machine Translation

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Noisy SMS Machine Translation in Low-Density Languages

arxiv: v1 [cs.cl] 2 Apr 2017

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

(Sub)Gradient Descent

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

The NICT Translation System for IWSLT 2012

CS 598 Natural Language Processing

Re-evaluating the Role of Bleu in Machine Translation Research

Python Machine Learning

Cross Language Information Retrieval

Learning to Schedule Straight-Line Code

Discriminative Learning of Beam-Search Heuristics for Planning

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

The KIT-LIMSI Translation System for WMT 2014

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Lecture 1: Machine Learning Basics

Natural Language Processing. George Konidaris

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

The stages of event extraction

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

CSL465/603 - Machine Learning

CS 446: Machine Learning

Speech Recognition at ICSI: Broadcast News and beyond

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Grammars & Parsing, Part 1:

Laboratorio di Intelligenza Artificiale e Robotica

Diagnostic Test. Middle School Mathematics

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Lecture 1: Basic Concepts of Machine Learning

Evolution of Symbolisation in Chimpanzees and Neural Nets

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Corrective Feedback and Persistent Learning for Information Extraction

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Beyond the Blend: Optimizing the Use of your Learning Technologies. Bryan Chapman, Chapman Alliance

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Experts Retrieval with Multiword-Enhanced Author Topic Model

Using dialogue context to improve parsing performance in dialogue systems

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Assignment 1: Predicting Amazon Review Ratings

Model Ensemble for Click Prediction in Bing Search Ads

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

CS Machine Learning

Probabilistic Latent Semantic Analysis

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Theoretical Syntax Winter Answers to practice problems

Task Tolerance of MT Output in Integrated Text Processes

Managerial Decision Making

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

TextGraphs: Graph-based algorithms for Natural Language Processing

Multilingual Information Access Douglas W. Oard College of Information Studies, University of Maryland, College Park

Context Free Grammars. Many slides from Michael Collins

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

HOLIDAY LESSONS.com

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Student Services Job Family FY18 General

Cross-lingual Text Fragment Alignment using Divergence from Randomness

Annotation Projection for Discourse Connectives

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Learning Methods in Multilingual Speech Recognition

A heuristic framework for pivot-based bilingual dictionary induction

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Beyond the Pipeline: Discrete Optimization in NLP

Learning Methods for Fuzzy Systems

Speech Emotion Recognition Using Support Vector Machine

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Henry Tirri* Petri Myllymgki

Learning From the Past with Experiment Databases

Knowledge Transfer in Deep Convolutional Neural Nets

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

INPE São José dos Campos

Deep Neural Network Language Models

Finding Your Friends and Following Them to Where You Are

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

On-the-Fly Customization of Automated Essay Scoring

Welcome to. ECML/PKDD 2004 Community meeting

Detecting English-French Cognates Using Orthographic Edit Distance

arxiv:cmp-lg/ v1 22 Aug 1994

Calibration of Confidence Measures in Speech Recognition

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Transcription:

Machine Learning in Statistical Machine Translation Phil Blunsom Philipp Koehn 26 November 2008

Machine Translation 1 Task: make sense of foreign text like AI-hard: ultimately reasoning and world knowledge required Statistical machine translation: Learn how to translate from data

Prediction Problem 2 Given an input sentence, we have to predict an output translation Ich gehe ja nicht zum Haus. I do not go to the house. Since the set of possible output sentences is too large, we need to construct the translation according to some decomposition of the translation process

Word-Based Model 3 Original statistical machine translation models (1990s): break down translation to the word level

Phrase-Based Model 4 Current state of the art: map larger chunks of words (huge mapping tables)

5 Tree-Based Model S PRO VP VP VP VBZ wants TO to VB NP NP NP PP PRO she DET a NN cup IN of NN NN coffee VB drink Sie PPER will VAFIN eine ART Tasse NN Kaffee NN trinken VVINF NP S VP One way forward: generate translation with syntactic structure

Structured Prediction 6 A prediction problem given an input predict an output many example (input, output) pairs available But: space of possible outputs too large prediction has to be broken down into steps decomposition of the problem is a hidden variable search space too large to explore exhaustively Additional trouble there is not a single right translation, many are possible evaluation of machine translation unclear

Learning Problem: Word Alignment For many models, an essential first step is establishing the word alignment in the training data michael assumes that he will stay in the house michael geht davon aus, dass er im haus bleibt 7 Very little labeled data available typically treated as unsupervised learning problem

Learning Problem: Model Parameters The output translation from an input sentence is derived over several steps segmentation of the input word and phrase translation reordering Each of the steps is modeled by probability distributions or features How do we learn the parameters for these models? 8

9 Heuristic Generative Model The decomposition of the translation process breaks down into steps Each step is modeled with a probability distribution Phrase translation probability distributions are estimated by maximum likelihood estimation: p(house Haus) = count(house,haus) count(haus) This is a biased ML estimator, we d like to replace it: Bayesian approach [Blunsom, Cohn and Osborne, 2008]

Discriminatively Combining Local Models Sentence translation is a combination of several component models 10 p LM p T M p D These may be weighted p λ LM LM pλ T M T M pλ D D Many components p i with weights λ i i p λ i i = exp i λ i log(p i ) Optimizing the weights λ i to directly optimize translation performance

11 Global Discriminative Model Where we are now: a unsatisfying mix of local models and global models Grand goal: train all parameters discriminatively to optimize translation Note: hidden derivation millions of sentence pairs millions of features heavy computational problem Ongoing work Perceptron, MIRA [Arun and Koehn, 2007] probabilistic model [Blunsom and Osborne, 2008]

Deluge of Data 12 Parallel texts: 100s millions of words translation models take up giga-bytes on disk Monolingual texts: trillions of words much more than we can currently handle Need for efficient data structures and training methods suffix arrays for on-the-fly translation model [Lopez et al., 2008] randomized language models [Talbot and Osborne, 2008]

Related Task: Tools for Translators 13 Learning task: predicting the next user input

Machine Translaton at Edinburgh People 2 faculty: Philipp Koehn and Miles Osborne 3 postdocs, 1 research programmer, 7 PhD students Funding European projects: EuroMatrix, EuroMatrixPlus DARPA project: GALE EPSRC project: Demeter Industry: Google, Systran Resources for the community our open source Moses decoder is standard benchmark for MT community we organize MT evaluation campaigns, open source conventions, workshops Online demo: http://demo.statmt.org/webtrans/ 14