The JHU WS2006 IWSLT System Experiments with Confusion Net Decoding

Similar documents
Speech Recognition at ICSI: Broadcast News and beyond

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Noisy SMS Machine Translation in Low-Density Languages

Learning Methods in Multilingual Speech Recognition

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Calibration of Confidence Measures in Speech Recognition

Linking Task: Identifying authors and book titles in verbose queries

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Language Model and Grammar Extraction Variation in Machine Translation

The NICT Translation System for IWSLT 2012

Discriminative Learning of Beam-Search Heuristics for Planning

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Multi-Lingual Text Leveling

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Switchboard Language Model Improvement with Conversational Data from Gigaword

Modeling function word errors in DNN-HMM based LVCSR systems

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

The KIT-LIMSI Translation System for WMT 2014

DICE - Final Report. Project Information Project Acronym DICE Project Title

A Case Study: News Classification Based on Term Frequency

Deep Neural Network Language Models

Arabic Orthography vs. Arabic OCR

Improvements to the Pruning Behavior of DNN Acoustic Models

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Developing a TT-MCTAG for German with an RCG-based Parser

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Assignment 1: Predicting Amazon Review Ratings

The stages of event extraction

A study of speaker adaptation for DNN-based speech synthesis

Modeling function word errors in DNN-HMM based LVCSR systems

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Introduction to Causal Inference. Problem Set 1. Required Problems

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

arxiv: v1 [cs.lg] 7 Apr 2015

Cross Language Information Retrieval

arxiv: v1 [cs.cl] 2 Apr 2017

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

Physics 270: Experimental Physics

The Smart/Empire TIPSTER IR System

Investigation on Mandarin Broadcast News Speech Recognition

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

AP Calculus AB. Nevada Academic Standards that are assessable at the local level only.

Using dialogue context to improve parsing performance in dialogue systems

Eye Movements in Speech Technologies: an overview of current research

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

The Good Judgment Project: A large scale test of different methods of combining expert predictions

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

APA Basics. APA Formatting. Title Page. APA Sections. Title Page. Title Page

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Characteristics of the Text Genre Realistic fi ction Text Structure

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Prediction of Maximal Projection for Semantic Role Labeling

Beyond the Pipeline: Discrete Optimization in NLP

A Neural Network GUI Tested on Text-To-Phoneme Mapping

CSC200: Lecture 4. Allan Borodin

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Applications of memory-based natural language processing

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

CS 598 Natural Language Processing

Visit us at:

1.11 I Know What Do You Know?

Improved Hindi Broadcast ASR by Adapting the Language Model and Pronunciation Model Using A Priori Syntactic and Morphophonemic Knowledge

Rule Learning With Negation: Issues Regarding Effectiveness

Letter-based speech synthesis

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Mastering Team Skills and Interpersonal Communication. Copyright 2012 Pearson Education, Inc. publishing as Prentice Hall.

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Lecture 1: Machine Learning Basics

Re-evaluating the Role of Bleu in Machine Translation Research

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

On-Line Data Analytics

Age Effects on Syntactic Control in. Second Language Learning

Radius STEM Readiness TM

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Learning goal-oriented strategies in problem solving

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Transcription:

The JHU WS2006 IWSLT System Experiments with Confusion Net Decoding Wade Shen, Richard Zens, Nicola Bertoldi and Marcello Federico 1

Outline Spoken Language Translation Motivations ASR and MT Statistical Approaches Confusion Network Decoding Confusion Networks Decoding of Confusion Network Input Other Applications of Confusion Networks Factored Models for TrueCasing Evaluation Experiments 2

Motivations Spoken Language Translation Translation from speech input is likely more difficult than translation from text Input many styles and genres formal read speech, unplanned speeches, interviews, spontaneous conversations,... less controlled language relaxed syntax, spontaneous speech phenomena automatic speech recognition is prone to errors possible corruption of syntax and meaning Need better integration for ASR and MT to improve spoken language translation 3

Combining ASR and MT Correlation between transcription word-error-rate and translation quality: Better transcriptions could have existed during ASR decoding: may get pruned for 1-best hypothesis Potential for improving translation quality by exploiting more transcription hypotheses generated during ASR. 4

Spoken Language Translation Statistical Approach Let be the foreign language speech input Let be a set of possible transcriptions of Goal Find the best translation e* given this approximation: is computed with a log-linear model with: Acoustics features: i.e. probs that some foreign words are in the input Linguistic features: i.e. probs of foreign and English sentences Translation features: i.e. probs of foreign phrases into English Alignment features: i.e. probs for word re-ordering 5

ASR Word Graph A very general set of transcriptions can be represented by a word-graph: directly computed from the ASR word lattice (e.g. HTK format, lattice-tool) provides a good representations of all hypotheses analyzed by the ASR system arcs are labeled with words, acoustic and language model probabilities paths correspond to transcription hypotheses for which probabilities can be computed 6

Overview of SLT Approaches 1-best Translation: Translate most probable word-graph path Pros Cons N-best Translation: Translate N most probable paths Pros Cons Finite State Transducer: Compose WG with translation FSN Pros Cons Confusion Network: translate linear approximation of WG Pros Cons Most efficient no potential to recover from recognition errors Least efficient (linearly proportional to N) N must be large in order to include good transcriptions Most straightforward, can examine full word graph Prohibitive with large vocabs and long range re-ordering Can effectively explore graph w/o reordering problems Can overgenerate the input word graph 7

Outline Spoken Language Translation Motivations ASR and MT Statistical Approaches Confusion Network Decoding Confusion Networks Decoding of Confusion Network Input Other Applications of Confusion Networks Factored Models for TrueCasing Evaluation Experiments 8

Confusion Networks A confusion network approximates a word graph with a linear network, such that: arcs are labeled with words or with the empty word (-word) arcs are weighted with word posterior probabilities CNs can be conveniently represented as a sequence of columns of different depths 9

Confusion Network Decoding Process Extension of basic phrase-based decoding process: cover some not yet covered consecutive columns (span) retrieve phrase-translations for all paths inside the columns compute translation, distortion and target language models Example: Coverage Vector = 01110, path = cancello d 10

Confusion Net Decoding Moses Implementation Computational issues: Number of paths grows exponentially with span length Implies look-up of translations for a huge number of source phrases Factored models require considering joint translation over all factors (tuples): cartesian product of all translations of each single factor Solutions implemented in Moses Source entries of the phrase-table are stored with prefix-trees Translations of all possible coverage sets are pre-fetched from disk Efficiency achieved by incrementally pre-fetching over the span length Phrase translations over all factors are extracted independently, then translation tuples are generated and pruned by adding a factor each time Once translation tuples are generated, usual decoding applies. 11

Other Applications of Confusion Nets Linguistic annotation for factored models avoid hard decision by linguistic tools but rather provide alternative annotations with respective scores: e.g. particularly ambiguous part of speech tags Translation of input similar to that produced by speech recognition e.g. OCR output for optical text translation Insertion of punctuation marks missing in the input model all possible insertions of punctuation marks in the input... 12

Outline Spoken Language Translation Motivations ASR and MT Statistical Approaches Confusion Network Decoding Confusion Networks Decoding of Confusion Network Input Other Applications of Confusion Networks Factored Models for TrueCasing Evaluation Experiments 13

Factored Models Factored representation Source Target surface form lemma morphology Translation Models Combine translation/generation/lms in log-linear way Benefits Generalization: Gather stats over generalized classes Richer models: Can make use different linguistic representations 14 surface form lemma morphology Generation Models Target LMs can be applied for different factors

Factored Models for TrueCasing Let be the uncased word sequence Let be the TrueCased word sequence Mixed-case Language Model Generation Model Translate lowercased, generate TrueCase, apply LM for both Integrated into decoding Generation and language models jointly optimized with other translation models 15 Using Powell-like MER procedure

Outline Spoken Language Translation Motivations ASR and MT Statistical Approaches Confusion Network Decoding Confusion Networks Decoding of Confusion Network Input Other Applications of Confusion Networks Factored Models for TrueCasing Evaluation Experiments 16

Dev and Eval Corpus Statistics Training Set Statistics (same models as MIT/LL) Dev4 Confusion Network Statistics Dev4 and test Word Error Rates 17

Results Overall Results Confusion Net Punctuation (dev4) Factored Truecasing (dev4) 18

Conclusions and Follow-on Work Confusion net decoding shows significant gains Especially in spontaneous speech Up to 6.4% relative improvement (higher WER?) Confusion nets may be helpful for coupling MT with preprocessing steps Benefits with ASR Modest benefits with repunctuation Single pass TrueCasing may be helpful Joint decoding yields 2.0% relative increase moses available (open source) for research http://www.statmt.org/moses/ 19