Simultaneous German-English Lecture Translation Muntsin Kolss, Matthias Wölfel, Florian Kraft, Jan Niehues, Matthias Paulik, Alex Waibel

Similar documents
The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Speech Recognition at ICSI: Broadcast News and beyond

Learning Methods in Multilingual Speech Recognition

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

The KIT-LIMSI Translation System for WMT 2014

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Modeling function word errors in DNN-HMM based LVCSR systems

The 2014 KIT IWSLT Speech-to-Text Systems for English, German and Italian

Language Model and Grammar Extraction Variation in Machine Translation

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

Modeling function word errors in DNN-HMM based LVCSR systems

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Noisy SMS Machine Translation in Low-Density Languages

arxiv: v1 [cs.cl] 2 Apr 2017

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Calibration of Confidence Measures in Speech Recognition

Mandarin Lexical Tone Recognition: The Gating Paradigm

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Cross Language Information Retrieval

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Improvements to the Pruning Behavior of DNN Acoustic Models

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Detecting English-French Cognates Using Orthographic Edit Distance

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Letter-based speech synthesis

The NICT Translation System for IWSLT 2012

Miscommunication and error handling

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Software Maintenance

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Artificial Neural Networks written examination

Bi-Annual Status Report For. Improved Monosyllabic Word Modeling on SWITCHBOARD

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

Speech Translation for Triage of Emergency Phonecalls in Minority Languages

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Timeline. Recommendations

Summary / Response. Karl Smith, Accelerations Educational Software. Page 1 of 8

Lecture 10: Reinforcement Learning

Constructing Parallel Corpus from Movie Subtitles

Small-Vocabulary Speech Recognition for Resource- Scarce Languages

Florida Reading Endorsement Alignment Matrix Competency 1

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Deep Neural Network Language Models

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

A Quantitative Method for Machine Translation Evaluation

Criterion Met? Primary Supporting Y N Reading Street Comprehensive. Publisher Citations

Phonological and Phonetic Representations: The Case of Neutralization

ACCOMMODATIONS FOR STUDENTS WITH DISABILITIES

Aviation English Solutions

Learning Methods for Fuzzy Systems

Laboratorio di Intelligenza Artificiale e Robotica

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Firms and Markets Saturdays Summer I 2014

Eliciting Language in the Classroom. Presented by: Dionne Ramey, SBCUSD SLP Amanda Drake, SBCUSD Special Ed. Program Specialist

Laboratorio di Intelligenza Artificiale e Robotica

Investigation on Mandarin Broadcast News Speech Recognition

Phonological encoding in speech production

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

On the Formation of Phoneme Categories in DNN Acoustic Models

Using dialogue context to improve parsing performance in dialogue systems

BMBF Project ROBUKOM: Robust Communication Networks

EUROPEAN DAY OF LANGUAGES

TASK 2: INSTRUCTION COMMENTARY

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

elearning OVERVIEW GFA Consulting Group GmbH 1

Re-evaluating the Role of Bleu in Machine Translation Research

CENTRAL MAINE COMMUNITY COLLEGE Introduction to Computer Applications BCA ; FALL 2011

Large Kindergarten Centers Icons

Running head: LISTENING COMPREHENSION OF UNIVERSITY REGISTERS 1

SSIS SEL Edition Overview Fall 2017

Information Event Master Thesis

Arabic Orthography vs. Arabic OCR

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

arxiv: v1 [cs.lg] 7 Apr 2015

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

Processing Lexically Embedded Spoken Words

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

DNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS

Secondary English-Language Arts

Lecture 1: Machine Learning Basics

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Transcription:

Simultaneous German-English Lecture Translation Muntsin Kolss, Matthias Wölfel, Florian Kraft, Jan Niehues, Matthias Paulik, Alex Waibel IWSLT 2008, October 21, 2008

Simultaneous Lecture Translation: Challenges (for German-English) Unlimited Domain: Wide variety of topics Lectures often go deeply into detail: specialized vocabulary and expressions Spoken Language: Most lecturers are not professionally trained speakers Conversational speech, more informal than prepared speeches Long monologues, often not easily separable utterances with sentence boundaries Strict Real-time and Latency requirements German-English specific: English words embedded in German, especially technical terms German compounds Long-distance word reordering

System Overview

English Words in German Lectures Language German English Both Unknown Total Words 4195 110 1397 887 Deletions 52 1 44 0 Insertions 58 9 37 2 Substitutions German 258 37 91 113 English 7 6 8 7 Both 68 10 33 56 Unknown 5 3 2 4 Total Error 448 66 215 182 WER 10.7% 60.0% 15.4% 20.5%

English Words in German Lectures Two Approaches: Use two phoneme sets in parallel, one each for German and English (parallel) Map the English pronunciation dictionary to German phonemes (mapping) WER Language All German English Both Unknown Baseline 13.8% 10.7% 60.0% 15.4% 20.5% Mapping 12.7% 11.1% 34.6% 13.8% 16.1% Parallel 13.4% 11.4% 26.4% 14.7% 18.9%

Machine Translation: Adaptation to Lectures Training data: German-English EPPS, News Commentary, Travel Expression Corpus 100K corpus of German lectures held at Universtität Karlsruhe, transcribed and translated into English Baseline Language Model (LM) Adaptation Translation Model (TM) Adaptation LM and TM adaptation + Rule-based word reordering + Discriminative Word Alignment Dev 31.54 33.11 33.09 34.00 34.59 35.24 Test 27.18 29.17 30.46 30.94 31.38 31.40

Automatic Simultaneous Translation: Input Segmentation Text Translation source sentence MT Decoder target sentence Speech Translation (turn-based, push-to-talk dialog systems) source utterance MT Decoder target utterance Simultaneous Translation continuous ASR input Segmentation MT Decoder target segment

Low latency translation is easy 40 35 30 [%] BLEU 25 20 15 10 Fixed Segment Length 5 0 1 2 3 4 5 6 7 8 9 10 15 20 50 100 10K Segment Length

Disadvantages of Input Segmentation Choosing meaningful segment boundaries is difficult and error-prone No recovery from segmentation errors, input segmentation makes hard decisions Phrases which would match across the segment boundaries can no longer be used No word reordering across segment boundaries is possible Language model context is lost across the segment boundaries If the language model is trained on sentence segmented data there will often be a mismatch for the begin-of-sentence and end-of-sentence LM events

Phrase-based SMT decoder I have heard traditional values referred to I have heard traditional values referred to he escuchado relacionarlo con valores tradicionales

Stream Decoding: Continuous Translation Lattice and the inspiration for the exact motivation of the stimuli was derived from experiments in which we use these networks for geometrical figures and we ask subjects to describe... No input segmentation: process infinite input stream from speech recognizer, extending/truncating the translation lattice in

Stream Decoding: Continuous Translation Lattice and the inspiration for the exact motivation of the stimuli was derived from experiments in which we use these networks for geometrical figures and we ask subjects to describe... No input segmentation: process infinite input stream from speech recognizer, extending/truncating the translation lattice in which

Stream Decoding: Continuous Translation Lattice and the inspiration for the exact motivation of the stimuli was derived from experiments in which we use these networks for geometrical figures and we ask subjects to describe... No input segmentation: process infinite input stream from speech recognizer, extending/truncating the translation lattice in which we

Stream Decoding: Continuous Translation Lattice and the inspiration for the exact motivation of the stimuli was derived from experiments in which we use these networks for geometrical figures and we ask subjects to describe... No input segmentation: process infinite input stream from speech recognizer, extending/truncating the translation lattice in which we use

Stream Decoding: Continuous Translation Lattice and the inspiration for the exact motivation of the stimuli was derived from experiments in which we use these networks for geometrical figures and we ask subjects to describe... No input segmentation: process infinite input stream from speech recognizer, extending/truncating the translation lattice in which we use these networks for

Stream Decoding: Continuous Translation Lattice and the inspiration for the exact motivation of the stimuli was derived from experiments in which we use these networks for geometrical figures and we ask subjects to describe... No input segmentation: process infinite input stream from speech recognizer, extending/truncating the translation lattice use these networks for

Stream Decoding: Asynchronous Input and Output Each incoming source word from the recognizer triggers a new search through the current translation lattice Output of resulting best hypothesis is partially or completely delayed, until either a time out occurs or new input arrives, which leads to lattice expansion and a new search Creates sliding window during which translation output lags the incoming source stream use these networks for

Stream Decoding: Output Segmentation Decide which part of the current best translation hypothesis to output, if any at all: Minimum Latency L min : The translation covering the last L min untranslated source words received from the speech recognizer at any point is never output (except for time-outs) Maximum Latency L max : When the latency reaches L max source words, translation output covering the source words exceeding this value is forced in which we use these networks for L min L max

Stream Decoding: Output Segmentation Backtrace hypothesis until L min source words have been passed If the hypothesis reached contains reordering gaps, continue backtracing until state with no open reorderings If no such state can be found, perform a new restricted search that only expands hypotheses which have to open reorderings at the node where the maximum latency would be exceeded L min L max

Stream Decoding Performance under Latency Constraint L min and L max chosen to optimize translation quality 40 35 30 [%] BLEU 25 20 15 10 5 Fixed Segment Length Keeping LM State Acoustic Features Stream Decoding 0 1 2 3 4 5 6 7 8 9 10 15 20 50 100 10K Segment Length

Choosing optimal parameter values for L min and L max 40 35 30 25 [%] BLEU 20 15 10 5 7 9 0 10 9 8 7 6 5 Minim um Latency 4 3 2 1 0 1 3 5 Maxim um Latency

Summary Current system for simultaneous translation of German lectures to English combines state-of-the-art ASR and SMT components ASR system modified to handle German compounds, and English terms and expressions embedded in German lectures SMT system uses additional compound splitting and model adaptation to topic and style of lectures Experiments with Stream Decoding to reduce latencies of the overall system Generated translation output provides a good idea of what the German lecturer said Major challenge for the future is better addressing long-range word reordering requirements between German and English