Language Model Adaptation for Statistical Machine Translation with Structured Query Models

Similar documents
Probabilistic Latent Semantic Analysis

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

A Case Study: News Classification Based on Term Frequency

Cross Language Information Retrieval

Finding Translations in Scanned Book Collections

Language Independent Passage Retrieval for Question Answering

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Noisy SMS Machine Translation in Low-Density Languages

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Performance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database

Switchboard Language Model Improvement with Conversational Data from Gigaword

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Multi-Lingual Text Leveling

Assignment 1: Predicting Amazon Review Ratings

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Speech Recognition at ICSI: Broadcast News and beyond

arxiv: v1 [cs.cl] 2 Apr 2017

Combining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval

Investigation on Mandarin Broadcast News Speech Recognition

UMass at TDT Similarity functions 1. BASIC SYSTEM Detection algorithms. set globally and apply to all clusters.

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

HLTCOE at TREC 2013: Temporal Summarization

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Linking Task: Identifying authors and book titles in verbose queries

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Cross-lingual Text Fragment Alignment using Divergence from Randomness

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Using Web Searches on Important Words to Create Background Sets for LSI Classification

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

The KIT-LIMSI Translation System for WMT 2014

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

Language Model and Grammar Extraction Variation in Machine Translation

On document relevance and lexical cohesion between query terms

Constructing Parallel Corpus from Movie Subtitles

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Deep Neural Network Language Models

arxiv: v1 [cs.lg] 3 May 2013

Matching Similarity for Keyword-Based Clustering

A Bayesian Learning Approach to Concept-Based Document Classification

Proceedings of the 19th COLING, , 2002.

Using dialogue context to improve parsing performance in dialogue systems

Learning Methods in Multilingual Speech Recognition

Modeling function word errors in DNN-HMM based LVCSR systems

Term Weighting based on Document Revision History

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt

A Comparison of Two Text Representations for Sentiment Analysis

Calibration of Confidence Measures in Speech Recognition

The stages of event extraction

The NICT Translation System for IWSLT 2012

BYLINE [Heng Ji, Computer Science Department, New York University,

Integrating Semantic Knowledge into Text Similarity and Information Retrieval

A study of speaker adaptation for DNN-based speech synthesis

Modeling function word errors in DNN-HMM based LVCSR systems

Machine Translation on the Medical Domain: The Role of BLEU/NIST and METEOR in a Controlled Vocabulary Setting

Matching Meaning for Cross-Language Information Retrieval

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Overview of the 3rd Workshop on Asian Translation

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

TINE: A Metric to Assess MT Adequacy

Search right and thou shalt find... Using Web Queries for Learner Error Detection

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE

Word Translation Disambiguation without Parallel Texts

CROSS LANGUAGE INFORMATION RETRIEVAL FOR LANGUAGES WITH SCARCE RESOURCES. Christian E. Loza. Thesis Prepared for the Degree of MASTER OF SCIENCE

Regression for Sentence-Level MT Evaluation with Pseudo References

The Smart/Empire TIPSTER IR System

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Cross-Lingual Text Categorization

indexing many slides courtesy James

End-to-End SMT with Zero or Small Parallel Texts 1. Abstract

Re-evaluating the Role of Bleu in Machine Translation Research

Dictionary-based techniques for cross-language information retrieval q

Compositional Semantics

Automatic Translation of Norwegian Noun Compounds

Online Updating of Word Representations for Part-of-Speech Tagging

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Universiteit Leiden ICT in Business

Ontological spine, localization and multilingual access

Conversational Framework for Web Search and Recommendations

As a high-quality international conference in the field

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Resolving Ambiguity for Cross-language Retrieval

A Graph Based Authorship Identification Approach

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

ScienceDirect. Malayalam question answering system

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Transcription:

Language Model Adaptation for Statistical Machine Translation with Structured Query Models Bing Zhao, Matthias Eck, Stephan Vogel CMU Coling 2004 presented by Sarah Schwarm, 11/10/2004

Goal: Language Model Adaptation Problem: Insufficient in-domain LM training data Approach: unsupervised data augmentation by retrieval of relevant documents from large monolingual corpora... and interpolation of model built from retrieved data with a background LM

Approach Baseline SMT Decoder First-pass Translation Hyps Query Reformulator Queries IR System (Target Language) Domain-specific Data Test Data Background LM Interpolate Small Domainspecific LM Combined LM Translation Model Second-pass SMT Decoder Final Translation Hypotheses

Questions to Address Should we use only 1-best, or n-best hyps for query generation? How should queries be constructed: bag-ofwords, or more structured? How many documents should be retrieved, and what is the scope of a document?

Results from [Eck 2004] Used data retrieved from a local index (Lemur IR system) rather than the web Used Term Frequency /Inverse Document Frequency (tf/idf) for retrieval (outperformed two other IR techniques) Sentence-level retrieval outperforms story-level Big improvements in perplexity, smaller actual improvement Stemming and stopword removal were not helpful

Sentence Retrieval Process tf/idf queries built from translation hyps from first-pass decoder Consider each sentence as its own document Convert query and sentences in corpus into vectors Assign term weight to each word Calculate cosine similarity between query and sentences in corpus Select most similar 1-1000 sentences

Bag-of-words Query Models (1/3) 1-best hyp as query model w i is a word in V T 1, the vocab of the top-1 hypothesis f i is the frequency of w i Q T 1 = (w 1,w 2,...,w l ) = {(w i, f i ) w i V T 1 }

Bag-of-words Query Models (2/3) N-best hyps as query model Q T N = (w 1,1,w 1,2,...,w 1,l1 ;...;w N,1,w N,2,...w N,lN ) = {(w i, f i ) w i V T N } Benefits of Q T N Contains more translation candidates; more informative than Q T 1 Confident translations occur more, so they have a higher term frequency and more impact on retrival

Bag-of-words Query Models (3/3) Translation model as query model Extract n-grams from source sentence Collect all candidate translations from TM Q T M = (w s1,1,w s1,2,...w s1,n 1 ;...;w si,1,w si,2,...,w si,n I ) = {(w i, f i ) w i V T M } No decoding, no use of background LM Q T M is a generalization of Q T 1 and (subject to more noise) Q T N

Structured Query Models Word order and word proximity: Ignored by bag-of-words models Convey syntactic and semantic information Can be extracted from 1-best/n-best hyps and translation lattices

Structured Query Language InQuery (Lemur Toolkit) Four proximity operators (ordered and unordered windows) in queries Sum: #sum(t 1,...,t n ) all terms have equal influence, avg. belief values #wsum(w 1 : t 1,...,w n : t n ) Weighted sum: Ordered distribution operator #N(t 1...t n ) Terms must be within N word of each other Unordered distribution operator #uwn(t 1...t n ) Terms in any order within a window of N words

Structured Query Models (1/2) Collect target n-grams For 1/n-best hyps, collect n-grams related to each source word For TM, collect source n-grams and translate to target n-grams Model: collection of subsets of target n-grams Q st = { t, t,..., t } s1 s2 si tsi is a set of target n-grams for the source word s i tsi = {{t i,...} 1 gram ;{t i t i+1,...} 2 gram ;{t i 1 t i t i+1 } 3 gram...}

Structured Query Models (2/2) Example: sum of frequency-weighted sums #q=#sum(#wsum(2 eu 2 #phrase(european union)) #wsum(12 #phrase(the united states) 1 american 1 #phrase(an american)) #wsum(4 are 1 is) #wsum(8 markets 3 market)) #wsum(7 #phrase(the main) 5 primary));

Experiments Test set: 878 sentences from NIST June 2002 Chinese to English MT evaluation Report NIST and BLEU scores with 4 refs for each sentence Baseline model: TM training data: 284k parallel sentences LM training data: 160 words of general English news text LM adaptation corpora: 4 collections from the GigaWord Corpora (English news text) Preprocessing: lowercase, separate punctuation, no stopword removal

Results: Bag-of-words Models All adapted LMs outperformed the baseline Used 100-best list for Q T N model - only 9 Data from AFE corpus gave best improvement times bigger than Q T 1 (1-best) Retrieval of 100 sentences was best Overall, Q T N gave best results More alternatives than Q T 1 Q T M probably contributed bad alternatives as well and good ones

Results: Structured Models Using more retrieved data (1000 sentences) gives better results Q T M appears to reduce noise in the retrieved data performs best - the structured model

Oracle Experiment Use reference translations to retrieve adaptation data (4000 sentences) Higher BLEU and NIST scores show room for improvement Better 1st pass translations lead to better retrieved data which leads to better 2nd pass translations - could we iterate? Results are still limited by TM and decoder

Summary and Future Work LM adaptation by retrieving sentences similar to initial translations results in improved performance Structured queries which capture word order outperform bag-of-words queries Future work: Will larger corpora for retrieval of adaptation data improve performance? Can translation probabilities be included in queries?