Statistical Machine Translation IBM Model 1 CS626/CS460. Anoop Kunchukuttan Under the guidance of Prof. Pushpak Bhattacharyya

Similar documents
DCA प रय जन क य म ग नद शक द र श नद श लय मह म ग ध अ तरर य ह द व व व लय प ट ह द व व व लय, ग ध ह स, वध (मह र ) DCA-09 Project Work Handbook

क त क ई-व द य लय पत र क 2016 KENDRIYA VIDYALAYA ADILABAD

S. RAZA GIRLS HIGH SCHOOL

HinMA: Distributed Morphology based Hindi Morphological Analyzer

CROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE

Question (1) Question (2) RAT : SEW : : NOW :? (A) OPY (B) SOW (C) OSZ (D) SUY. Correct Option : C Explanation : Question (3)

ENGLISH Month August

Detection of Multiword Expressions for Hindi Language using Word Embeddings and WordNet-based Features

The Prague Bulletin of Mathematical Linguistics NUMBER 95 APRIL


Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

F.No.29-3/2016-NVS(Acad.) Dated: Sub:- Organisation of Cluster/Regional/National Sports & Games Meet and Exhibition reg.

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

ह द स ख! Hindi Sikho!

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Cross-lingual Text Fragment Alignment using Divergence from Randomness

Cross Language Information Retrieval

Lecture 10: Reinforcement Learning

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Combining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval

Exemplar 6 th Grade Math Unit: Prime Factorization, Greatest Common Factor, and Least Common Multiple

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Lecture 1: Machine Learning Basics

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

INPE São José dos Campos

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

AMULTIAGENT system [1] can be defined as a group of

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Corpus Linguistics (L615)

Reinforcement Learning by Comparing Immediate Reward

Introduction to Simulation

FIGURE IT OUT! MIDDLE SCHOOL TASKS. Texas Performance Standards Project

व रण क ए आ दन-पत र. Prospectus Cum Application Form. न दय व kऱय सम त. Navodaya Vidyalaya Samiti ਨਵ ਦ ਆ ਦਵਦ ਆਦ ਆ ਸਦ ਤ. Navodaya Vidyalaya Samiti

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

1.11 I Know What Do You Know?

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Noisy SMS Machine Translation in Low-Density Languages

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

South Carolina English Language Arts

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

Extraordinary Eggs (Life Cycle of Animals)

On the Combined Behavior of Autonomous Resource Management Agents

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Learning to Rank with Selection Bias in Personal Search

Objective: Add decimals using place value strategies, and relate those strategies to a written method.

Laboratorio di Intelligenza Artificiale e Robotica

Focus of the Unit: Much of this unit focuses on extending previous skills of multiplication and division to multi-digit whole numbers.

Artificial Neural Networks written examination

Rule-based Expert Systems

arxiv: v1 [cs.cl] 2 Apr 2017

A Reinforcement Learning Variant for Control Scheduling

Common Core Standards Alignment Chart Grade 5

Major Milestones, Team Activities, and Individual Deliverables

Sample from: 'State Studies' Product code: STP550 The entire product is available for purchase at STORYPATH.

A process by any other name

Number Line Moves Dash -- 1st Grade. Michelle Eckstein

A heuristic framework for pivot-based bilingual dictionary induction

Switchboard Language Model Improvement with Conversational Data from Gigaword

Evaluating the impact of an education programme

The Conversational User Interface

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

Cognitive Prior-Knowledge Testing Method for Core Development of Higher Education of Computing in Academia

Language Model and Grammar Extraction Variation in Machine Translation

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

First Grade Standards

Probabilistic Latent Semantic Analysis

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

Comparing different approaches to treat Translation Ambiguity in CLIR: Structured Queries vs. Target Co occurrence Based Selection

Unit 3 Ratios and Rates Math 6

Probability and Game Theory Course Syllabus

Constructing Parallel Corpus from Movie Subtitles

To provide students with a formative and summative assessment about their learning behaviours. To reinforce key learning behaviours and skills that

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Comment-based Multi-View Clustering of Web 2.0 Items

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

The taming of the data:

BMBF Project ROBUKOM: Robust Communication Networks

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

How long did... Who did... Where was... When did... How did... Which did...

City University of Hong Kong Course Syllabus. offered by Department of Architecture and Civil Engineering with effect from Semester A 2017/18

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

Toward Probabilistic Natural Logic for Syllogistic Reasoning

CROSS LANGUAGE INFORMATION RETRIEVAL FOR LANGUAGES WITH SCARCE RESOURCES. Christian E. Loza. Thesis Prepared for the Degree of MASTER OF SCIENCE

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Full text of O L O W Science As Inquiry conference. Science as Inquiry

An investigation of imitation learning algorithms for structured prediction

California Treasures 4th Grade Practice Book

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

AIS/RTI Mathematics. Plainview-Old Bethpage

Rule Learning With Negation: Issues Regarding Effectiveness

A Quantitative Method for Machine Translation Evaluation

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Truth Inference in Crowdsourcing: Is the Problem Solved?

(Sub)Gradient Descent

Transcription:

Statistical Machine Translation IBM Model 1 CS626/CS460 Anoop Kunchukuttan anoopk@cse.iitb.ac.in Under the guidance of Prof. Pushpak Bhattacharyya

Why Statistical Machine Translation? Not scalable to build rule based systems between every pair of languages as in transfer based systems Can translation models be learnt from data? Many language phenomena and language divergences which cannot be encoded in rules Can translation patterns be memorized from data?

Noisy Channel Model Depicts model of translation from sentence f to sentence e. Task is to recover efrom noisy f. e Noisy Channel f P(f e): Translation model Addresses adequacy P(e): Language model addresses fluency

Three Aspects Modelling Propose a probabilistic model for sentence translation Training Learn the model parameters from data Decoding Given a new sentence, use the learnt model to translate the input sentence IBM Models 1 to 5 [1] define various generative models, and their training procedures.

This process serves as the basis for IBM Models 1 and 2 Generative Process 1 Given sentence e of length l Select the length of the sentence f, say m For each position jin f Choose the position a j to align in sentence e Choose the word f j भ रत य र ल द नय क सबस बड़ नय त ओ म एक ह j=0 a j The Indian Railways is one of the largest employers in the world

Alignments The generative process explains only one way of generating a sentence pair Each way corresponds to an alignment Total probability of the sentence pair is the sum of probability over all alignments Input: Parallel sentences 1 S in languages E and F But alignments are not known Goal: Learn the model P(f e)

IBM Model 1 Is a special case of Generative Process 1 Assumptions: Uniform distribution for length of f All alignments are equally likely Goal: Learn parameters t(f e) for model P(f e) for all f єf and e єe Chicken and egg situation w.r.t Alignments Word translations

Model 1 Training If the alignments are known, the translation probabilities can be calculated simply by counting the aligned words. But, if translation probabilities were not known then the alignments could be estimated. We know neither! Suggests an iterative method where the alignments and translation method are refined over time. It is the Expectation-Maximization Algorithm

Model 1 Training Algorithm Initialize all t(f e) to any value in [0,1]. Repeat the E-step and M-step till t(f e) values converge E-Step M-Step c(f e)is the expected count that f and e are aligned foreach sentence in training corpus foreach f,epair : Compute c(f e;f(s),e(s)) Use t(f e) values from previous iteration for each f,epair: compute t(f e) Use the c(f e) values computed in E-step

Let s train Model 1 Corpus आक श ब क ज न क र त पर चल Akashwalked on the road to the bank य म नद तट पर चल Shyam walked on the river bank आक श व र नद तट स र ट क च र ह रह ह Sand on the banks of the river is being stolen by Akash Stats 3 sentences English (e) vocabulary size: 15 Hindi (f) vocabulary size: 18

Model 1 in Action c(f e) sentence Iteration 1 Iteration 2 Iteration 5 Iteration 19 Iteration 20 आक श akash 1 0.066 0.083 0.29 0.836 0.846 आक श akash 2 0 0 0 0 0 आक श akash 3 0.066 0.083 0.29 0.836 0.846 ब क bank 1 0.066 0.12 0.09 0.067 0.067 ब क bank 2 0 0 0 0 0 ब क bank 3 0 0 0 0 0 t(f e) Iteration 1 Iteration 2 Iteration 5 Iteration 19 Iteration 20 आक श akash 0.125 0.1413 0.415 0.976 0.976 ब क bank 0.083 0.1 0.074 0.049 0.049 तट bank 0.083 0.047 0.019 0.002 0.002 तट river 0.142 0.169 0.353 0.499 0.499

Where did we get the Model 1 equations from? See the presentation model1_derivation.pdf, for more on parameter training

IBM Model 2 Is a special case of Generative Process 1 Assumptions: Uniform distribution for length of f All alignments are equally likely

Model 2 Training Algorithm Initialize all t(f e) and and a(i j,m,l) to any value in [0,1]. Repeat the E-step and M-step till t(f e) values converge E-Step foreach sentence in training corpus foreach f,epair : Compute c(f e;f(s),e(s)) and c(i j,m,l) Use t(f e) and a(i j,m,l) values from previous iteration Training process as in Model 1, except that equations become messier! M-Step for each f,e pair: compute t(f e) Use the c(f e) and c(i j,m,l) values computed in E-step

References 1. Peter Brown, Stephen Della Pietra, Vincent Della Pietra, Robert Mercer.TheMathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics. 1993. 2. Kevin Knight. A Statistical MT Tutorial Workbook. 1999. 3. Philip Koehnn. Statistical Machine Translation. 2008.

Generative Process 2 For each word e i in sentence e Select the number of words to generate Select the words to generate Permute the words Choose the number of words in f for which there are no alignments in e. Choose the words Insert them into proper locations

Generative Process 2 नगम सबस भ रत य र ल ह एक म बड़ नय त ओ क द नय The Indian Railways is one of the largest employers क the world This process serves as the basis for IBM Models 3 to 5

Generative Process 2 (Contd )