Using Word Posterior in Lattice Translation

Similar documents
The NICT Translation System for IWSLT 2012

Language Model and Grammar Extraction Variation in Machine Translation

Noisy SMS Machine Translation in Low-Density Languages

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems

Deep Neural Network Language Models

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

The KIT-LIMSI Translation System for WMT 2014

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Learning Methods in Multilingual Speech Recognition

A Quantitative Method for Machine Translation Evaluation

Calibration of Confidence Measures in Speech Recognition

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Speech Recognition at ICSI: Broadcast News and beyond

Improvements to the Pruning Behavior of DNN Acoustic Models

Language Independent Passage Retrieval for Question Answering

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Investigation on Mandarin Broadcast News Speech Recognition

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

The 2014 KIT IWSLT Speech-to-Text Systems for English, German and Italian

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 11 : 3 March 2011 ISSN

On the Formation of Phoneme Categories in DNN Acoustic Models

The RWTH Aachen University English-German and German-English Machine Translation System for WMT 2017

arxiv: v1 [cs.cl] 2 Apr 2017

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

arxiv: v1 [cs.cl] 27 Apr 2016

5 th Grade Language Arts Curriculum Map

The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation

Detecting English-French Cognates Using Orthographic Edit Distance

Training and evaluation of POS taggers on the French MULTITAG corpus

Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN:

Evolutive Neural Net Fuzzy Filtering: Basic Description

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

A study of speaker adaptation for DNN-based speech synthesis

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Python Machine Learning

3 Character-based KJ Translation

Letter-based speech synthesis

Regression for Sentence-Level MT Evaluation with Pseudo References

CS Machine Learning

The Effect of Extensive Reading on Developing the Grammatical. Accuracy of the EFL Freshmen at Al Al-Bayt University

Miscommunication and error handling

Re-evaluating the Role of Bleu in Machine Translation Research

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

A Neural Network GUI Tested on Text-To-Phoneme Mapping

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

WHEN THERE IS A mismatch between the acoustic

INPE São José dos Campos

Spanish IV Textbook Correlation Matrices Level IV Standards of Learning Publisher: Pearson Prentice Hall

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

Student Perceptions of Reflective Learning Activities

LOW-RANK AND SPARSE SOFT TARGETS TO LEARN BETTER DNN ACOUSTIC MODELS

Character Stream Parsing of Mixed-lingual Text

The Strong Minimalist Thesis and Bounded Optimality

Undergraduate Programs INTERNATIONAL LANGUAGE STUDIES. BA: Spanish Studies 33. BA: Language for International Trade 50

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report

Initial approaches on Cross-Lingual Information Retrieval using Statistical Machine Translation on User Queries

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Switchboard Language Model Improvement with Conversational Data from Gigaword

COPING WITH LANGUAGE DATA SPARSITY: SEMANTIC HEAD MAPPING OF COMPOUND WORDS

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Assignment 1: Predicting Amazon Review Ratings

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

SIE: Speech Enabled Interface for E-Learning

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

Rule Learning With Negation: Issues Regarding Effectiveness

Multi-Lingual Text Leveling

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

arxiv: v1 [cs.lg] 7 Apr 2015

Data Fusion Models in WSNs: Comparison and Analysis

Beyond the Pipeline: Discrete Optimization in NLP

Dropout improves Recurrent Neural Networks for Handwriting Recognition

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Comparison of network inference packages and methods for multiple networks inference

The Smart/Empire TIPSTER IR System

Corrective Feedback and Persistent Learning for Information Extraction

What the National Curriculum requires in reading at Y5 and Y6

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Transcription:

Using Word Posterior in Lattice Translation Vicente Alabau Institut Tecnològic d Informàtica e-mail: valabau@iti.upv.es October 16, 2007

Index Motivation Word Posterior Probabilities Translation System Results Conclusions and Future Work October 16, 2007 1

Motivation - Common approaches Serial approach: + simple and fast - propagates errors from ASR Semi-coupled approach: n-best: + simple - redundancy, time-consuming lattice: + full searched space - time-consuming confusion network: + simplified lattice, efficient - loss of grammar Integrated approach: + theoretically promising - bad performance on non-simple corpora October 16, 2007 2

Word Posterior Probabilities Motivation One should maximize word posterior probabilities to minimize WER (Mangu00) Confusion networks (Bertoldi05): word posterior probabilities lattice simplification Our approach Word posterior probabilities over a lattice Take advantage of techniques in confidence measures (Sanchis04) October 16, 2007 3

Word Posterior Probabilities: Forward-Backward being w the hypothesized word, s the start node and e the end node: P ([w, s, e] x T 1 ) = 1 P ( x T 1 ) f J 1 G : [w, s, e ] : w = w, s = s, e = e P (f J 1, x T 1 ) (1) c 0.3 a 0.2 a 0.1 a 0.3 b 0.2 c 0.1 b 0.1 c 0.2 b 0.1 a 0.5 a 0.6 c 0.5 1 T October 16, 2007 4

Word Posterior Probabilities maximum of the frame time posterior probability (Wessel01) P t (w x T 1 ) = t [s,e ] P ([w, s, e ] x T 1 ) (2) P ([w, s, e] x T 1 ) = max s t e P t(w x T 1 ) (3) c 0.8 a 0.9 a 0.9 a 0.8 b 0.2 c 0.4 b 0.2 c 0.7 b 0.2 a 0.8 a 0.9 c 0.8 1 T October 16, 2007 5

Translation System Log-linear model: Word posterior probabilities GIATI: Joint probability model N-grams of bilingual pairs 5-gram (w/o cutting off) integrated lattice search monotonous search Output word penalty Output language model (5-gram) October 16, 2007 6

Translation System Reordering: Serial, 1BEST approach Monotonization of the output Translate with moses from monotonized to regular word order Models: reordering table and output language model Monotonous search October 16, 2007 7

Preprocess and postprocess Preprocess: Case and punctuation were removed from training Sentence splitting at sentence boundaries (.?!) Lattice pruning Postprocess: Punctuation and case restoration: IWSLT06 method using SRILM Capitalization after punctuation marks October 16, 2007 8

System architecture October 16, 2007 9

Corpus statistics Train Dev4 Dev5a Dev5b Test Italian English Sentences 19971 Running words 172k 189k Vocabulary 10, 152 7, 165 Sentences 489 Running words 4, 831 6, 848 OOV words 224 208 Sentences 500 Running words 5, 607 7, 491 OOV words 296 264 Sentences 996 Running words 8, 487 11, 968 OOV words 591 611 Sentences 724 Running words 6, 420 9, 054 OOV words 542 439 October 16, 2007 10

Effect of adding features to the baseline model Primary run: 16.13 BLEU dev4 dev5a dev5b test BLEU NIST BLEU NIST BLEU NIST BLEU NIST baseline 36.29 7.59 31.96 7.06 12.53 4.02 22.80 5.49 +WP 37.45 7.35 32.55 6.82 14.07 3.77 19.56 5.06 +OL 37.06 7.42 32.55 6.91 12.37 3.82 22.32 5.25 +WP+OL 38.19 7.20 32.67 6.66 13.44 4.20 21.83 5.57 +RM 37.53 7.95 32.74 7.41 13.94 4.30 23.92 5.79 +WP+OL+RM 38.98 7.81 32.86 7.18 14.34 4.37 23.22 5.86 WP, output word insertion penalty OL, output language model RM, reordering model October 16, 2007 11

Effect of adding dev corpus to the training corpus Primary run: 16.13 BLEU w/o dev with dev BLEU NIST BLEU NIST baseline 22.80 5.49 31.29 6.66 +WP 22.09 5.56 12.16 2.97 +OL 22.79 5.52 30.83 6.64 +WP+OL 21.79 5.56 11.89 2.91 +RM 23.46 5.74 32.28 6.95 +WP+OL+RM 23.22 5.86 31.21 6.77 WP, output word insertion penalty OL, output language model RM, reordering model October 16, 2007 12

Results for different input conditions dev4 dev5a dev5b test BLEU NIST BLEU NIST BLEU NIST BLEU NIST 1BEST 33.53 6.92 26.97 6.12 13.21 4.19 21.50 5.56 LAT 33.69 6.95 27.24 6.14 13.35 4.16 18.71 5.22 GER 34.11 7.02 27.49 6.18 13.90 4.29 22.64 5.77 CLEAN 38.98 7.81 32.86 7.18 14.34 4.37 23.22 5.86 LAT, lattice with word posterior probabilities GER, using the sentence from the lattice with less word error rate October 16, 2007 13

Conclusions Word Posterior approach Results not conclusive Small differences between 1BEST and CLEAN scores Some improvements were achieved Needs work on pruning Adding devset to training matters October 16, 2007 14

Future Work Comparison with n-best, confidence measures, lattice with acoustic scores Add additional state-of-the-art confidence features Add translation features Features based on multiple lattices Lattice reduction October 16, 2007 15

Thank you for your attention! Vicente Alabau valabau@dsic.upv.es October 16, 2007 16

References [Mangu et al., 2000] Mangu, L., Brill E., and Stolcke A. (2000) Finding Consensus in Speech Recognition: Word Error Minimization and Other Applications of Confusion Networks, In Computer, Speech and Language, 14(4):373-400. [Wessel et al., 2001] Wessel, F., Schluter, R., Macherey, K., and Ney, H. (2001). Confidence measures for large vocabulary continuous speech recognition. IEEE Trans. Speech and Audio Processing, 9(3). [Bertoldi and Federico, 2005] Bertoldi, N. and Federico, M. (2005). A new decoder for spoken language translation based on confusion networks. In IEEE Automatic Speech Recognition and Understanding Workshop. [Sanchis, 2004] Sanchis-Navarro, J.A. (2004) Estimación y aplicación de medidas de confianza en reconocimiento automático del habla. Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia Tesis Doctoral en Informática October 16, 2007 17