Rapid Deployment of an Afrikaans-English Speech-to-Speech Translator. Herman Engelbrecht, Tanja Schultz

Similar documents
Learning Methods in Multilingual Speech Recognition

Speech Recognition at ICSI: Broadcast News and beyond

Speech Translation for Triage of Emergency Phonecalls in Minority Languages

Investigation on Mandarin Broadcast News Speech Recognition

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

Letter-based speech synthesis

Modeling function word errors in DNN-HMM based LVCSR systems

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

A study of speaker adaptation for DNN-based speech synthesis

Edinburgh Research Explorer

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

A Hybrid Text-To-Speech system for Afrikaans

Arabic Orthography vs. Arabic OCR

Modeling function word errors in DNN-HMM based LVCSR systems

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Experiments with Cross-lingual Systems for Synthesis of Code-Mixed Text

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

The 2014 KIT IWSLT Speech-to-Text Systems for English, German and Italian

Mandarin Lexical Tone Recognition: The Gating Paradigm

Language Model and Grammar Extraction Variation in Machine Translation

Cross Language Information Retrieval

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM

Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Text-to-Speech Application in Audio CASI

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Deep Neural Network Language Models

Universal contrastive analysis as a learning principle in CAPT

Noisy SMS Machine Translation in Low-Density Languages

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Florida Reading Endorsement Alignment Matrix Competency 1

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

First Grade Curriculum Highlights: In alignment with the Common Core Standards

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Phonological Processing for Urdu Text to Speech System

Improved Hindi Broadcast ASR by Adapting the Language Model and Pronunciation Model Using A Priori Syntactic and Morphophonemic Knowledge

Stages of Literacy Ros Lugg

Bi-Annual Status Report For. Improved Monosyllabic Word Modeling on SWITCHBOARD

PHONETIC DISTANCE BASED ACCENT CLASSIFIER TO IDENTIFY PRONUNCIATION VARIANTS AND OOV WORDS

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

Small-Vocabulary Speech Recognition for Resource- Scarce Languages

ELP in whole-school use. Case study Norway. Anita Nyberg

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Speech Emotion Recognition Using Support Vector Machine

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Phonological and Phonetic Representations: The Case of Neutralization

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

CROSS-LANGUAGE MAPPING FOR SMALL-VOCABULARY ASR IN UNDER-RESOURCED LANGUAGES: INVESTIGATING THE IMPACT OF SOURCE LANGUAGE CHOICE

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Automatic Assessment of Spoken Modern Standard Arabic

1. Introduction. 2. The OMBI database editor

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Cross-lingual Text Fragment Alignment using Divergence from Randomness

The Acquisition of English Intonation by Native Greek Speakers

Information Session 13 & 19 August 2015

Constructing Parallel Corpus from Movie Subtitles

Unit Selection Synthesis Using Long Non-Uniform Units and Phonemic Identity Matching

MERRY CHRISTMAS Level: 5th year of Primary Education Grammar:

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Richardson, J., The Next Step in Guided Writing, Ohio Literacy Conference, 2010

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Fisk Street Primary School

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Journal of Phonetics

The IRISA Text-To-Speech System for the Blizzard Challenge 2017

TEKS Comments Louisiana GLE

UKLO Round Advanced solutions and marking schemes. 6 The long and short of English verbs [15 marks]

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Consonants: articulation and transcription

Vowel mispronunciation detection using DNN acoustic models with cross-lingual training

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

Re-evaluating the Role of Bleu in Machine Translation Research

Investigation of Indian English Speech Recognition using CMU Sphinx

The Bruins I.C.E. School

WHEN THERE IS A mismatch between the acoustic

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

English Language and Applied Linguistics. Module Descriptions 2017/18

Weave the Critical Literacy Strands and Build Student Confidence to Read! Part 2

Problems of the Arabic OCR: New Attitudes

Linguistics 220 Phonology: distributions and the concept of the phoneme. John Alderete, Simon Fraser University

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

Coast Academies Writing Framework Step 4. 1 of 7

Transcription:

Rapid Deployment of an Afrikaans-English Speech-to-Speech Translator Herman Engelbrecht, Tanja Schultz

Outline Background and Motivation Language Characteristics: Afrikaans Development Strategy Data Resources Component Development Integrated Speech Translation System Results Conclusions 2

Background Africa 2000+ living languages 3

Background South Africa Population: 46.6 million Official Languages: 11 4

Motivation Small HLT community in South Africa and no significant, recent MT research activity. S.A. government is interested in building S.A. HLT capacity (especially speech-to-speech translation). 1 PhD student was sent to CMU for 3 month fellowship to study speech-to-speech translation in order that S2S translation can be developed for local languages. Rapid deployment of S2S system for Afrikaans-English language pair was used as the vehicle for studying S2S translation. 5

Language Characteristics: Afrikaans Afrikaans is a Germanic language and linguistically closely related to Dutch. Afrikaans has a more regular grammar than Dutch and the grammar is very analytic. Afrikaans text is written using Latin alphabet plus a few diacritics. Afrikaans spelling is more phonetic than Dutch. Words are separated by spaces in text no need for a word boundary segmenter and SMT algorithms can be readily applied. 62 phones typically used in spoken Afrikaans. 6

Development Strategy The choice of recognition, translation and synthesis strategies were influenced by the amount of time and labor-intensive work required to implement strategy. Data-driven techniques preferred over knowledge-based techniques and the following strategies were adopted: Recognition SLM based recognition strategy. Translation Statistical machine translation. Synthesis Concatenative synthesis. The focus was on the development of the ASR, MT and TTS components for the new language. 7

Development Strategy Needed to developed/obtain the following subcomponents for Afrikaans: ASR: Acoustic Models, Language Models and Pronunciation Lexicon. SMT: Translation Models and Language Models. TTS: Pronunciation Lexicon and Letter-to-Sound Rules. For English existing ASR and TTS components developed by CMU were used. The domain of the system was constrained by the available data resources to be on parliament debates (Hansards). 8

Data Resources Text Data: Parallel Afrikaans-English text corpus (Hansards). Speech Data: Afrikaans speech data from AST speech corpus (based on SpeechDat corpus). Hansard speech data recorded during fellowship. Pronunciation Lexicon: 5k lexicon obtained from AST speech corpus (includes pronunciation variants). 37k lexicon University of Stellenbosch (no pronunciation variants, but includes syllable markers). 9

Data Resources Parallel Text Corpus (Hansards): 43k parallel sentences. ± 700k words per language. ± 20k vocabulary per language. AST speech data (out-of-domain): 72% Telephone, 28% Mobile phone. 57% Female, 43% Male. Roughly 6 hours of transcribed speech. 265 speakers, ±40 utterances per speaker. Hansard speech data (in-domain): 1000 prompted utterances recorded on laptop by two native Afrikaans speakers (male and female). Utterances chosen from parallel text corpus. 10

Component Development - ASR Bootstrapped acoustic models from Global- Phone 7-lingual models using Janus JrTK. 39 phone models, 1 silence model: 13 vowels, 26 consonants, no diphthongs. No distinction between long and short vowels. Fully continuous 3-state HMM recogniser: 500 triphone models (tied using decision trees). 128 Gaussian per state. 13 MFCCs, power, and first and second derivates are reduced to 32 dimensions using LDA. Trained with VTLN and SAT. Training data (out-of-domain): 187 speakers, 7696 utterances. 11

Component Development - ASR Hansard Adaptation data (in-domain): 200 utterances, 2 speakers. Hansard Evaluation data (in-domain): 800 utterances, 2 speakers. Unadapted AMs Adapted AMs Number of words 15,259 15,259 Vocabulary size 2,450 2,450 Pronunciation variants 1.08 1.08 Trigam LM perplexity 103.71 103.71 WER (male) 39.1% 17.6% WER (female) 54.0% 22.3% WER (total) 46.5% 20.0% 12

Component Development - SMT PESA used for training. IBM1 model. Trigram LMs trained using SriLM software. Trained both Afrikaans-English and English- Afrikaans translation models. Experimented with punctuation included and with punctuation removed from text. Hansard Parallel Text Corpus: Train set: 41,239 utterances. Test set: 800 utterances (same as used for ASR). Sentences aligned using Europarl sentence aligner. 13

Component Development - SMT Text Data Language English Afrikaans Number of Sentences 41,239 Number of Words 687,154 694,455 Vocabulary size 17,898 25,623 LM Perplexity w/o punct. 87.21 103.71 LM Perplexity with punct. 62.28 72.28 Europarl Dutch-English with IBM4 translation model was used for comparison as the language pairs and domain are very similar. 14

Component Development - SMT Afrikaans-English English-Afrikaans Results BLEU NIST BLEU NIST IBM1 w/o punctuation 34.13 7.65 34.68 7.93 IBM1 with punctuation 36.11 7.66 34.81 7.73 Dutch-English with 740k Europarl corpus Dutch-English English-Dutch Results BLEU NIST BLEU NIST IBM4 26.35-22.85-15

Component Development - TTS Festival was used to build a male Afrikaans voice: Unit-selection voice. Trained Letter-to-sound rules. Binding of units for unit-selection voice. Phone set is identical to ASR phone set. 500 Hansard-domain utterances were used to train voice. Afrikaans pronunciation lexicon: 37k vocabulary size. No pronunciation variants. Syllables are marked. 16

Component Development - TTS Train set pronunciations 33,121 Train set pronunciations 3,680 Phones correct 97.92% Words correct 85.24% LTS results comparable to German (89.38% word correct). It is difficult to formally evaluate Afrikaans TTS with only 2 native speakers (especially if one is the developer). Informal evaluation was performed by simply listening to pronunciations to determine their correctness. 17

Integrated Translation System Description: Based on One4All demo scripts developed by ISL. Best ASR output used as SMT input. Re-used existing English ASR and TTS. 18

Results Afrikaans-English NIST BLEU System Input WER SCORE Rel. Imp. SCORE Rel. Imp. TEXT w/o punct. 0.0% 7.65-34.13 - ASR w/o punct. (Adapted AMs) 20.0% 6.12-20.0% 25.45-25.4% ASR w/o punct. (Unadapted AMs) 46.5% 4.56-40.4% 17.39-49.0% TEXT with punct. 0.0% 7.66-36.11 - ASR with punct. (Adapted AMs) 20.0% 6.04-21.1% 24.42-32.4% ASR with punct (Unadapted AMs) 46.5% 4.40-42.6% 16.72-53.7% 19

Results B L E U 4 0 T E X T 3 5 3 0 2 5 2 0 A f r i k a a n s - E n g l i s h S 2 S t r a n s l a t i o n r e s u l t s I B M 1 w i t h p u n c t. I B M 1 w / o p u n c t. A d a p t e d A M s U n a d a p t e d A M s 1 5 0 5 1 0 1 5 2 0 2 5 3 0 3 5 4 0 4 5 5 0 W E R [ % ] N I S T 8 7 6 5 T E X T A d a p t e d A M s I B M 1 w i t h p u n c t. I B M 1 w / o p u n c t. U n a d a p t e d A M s 4 0 5 1 0 1 5 2 0 2 5 3 0 3 5 4 0 4 5 5 0 W E R [ % ] 20

Results - Example translation Reference sentence: Firstly the lack of nursing staff remains a problem. Source sentence: Ten eerste bly die gebrek aan verpleegpersoneel n probleem. Recognised sentence: Ten eerste by gebrek aan verpleegpersoneel probleem. Machine Translation of recognised sentence: Firstly at the lack of nurses problem. Machine Translation of source sentence: Firstly I am glad the lack of nurses a problem. 21

Conclusions Development Time Component Development Time Speech recogniser 8 Machine translator 1 Speech synthesis 1 Integrated System 1 Evaluation 1 Total 12 SMT in a week Yes Speech-to-speech translation in a week - No 22

Conclusions Demonstrated rapid deployment of S2S translation system under somewhat idealised conditions as most of the data and development tools were readily available. Recognition component is still the most challenging component to develop for a new language as evidenced by 20% WER. Afrikaans-English SMT results very encouraging when compared to Dutch-English as only a simple translation model was used. As expected, errors in recognition degrades the translation. 23

Future work Use more sophisticated translation modelling and schemes. Develop local SMT software. Start looking at other local language pairs: isixhosa - English Sepedi English Challenges: Ntu languages are very different from the Germanic languages. Ntu languages only been written languages for ±150 years. 24

Component Evaluation - ASR Unadapted AMs Adapted AMs Number of words 15,259 15,259 Vocabulary size 2,450 2,450 Pronunciation variants 1.08 1.08 Trigam LM perplexity 103.71 103.71 WER (male) 39.1% 17.6% WER (female) 54.0% 22.3% WER (total) 46.5% 20.0% 25

Background South Africa Population: 46.6 million Official Languages: 11 26