Rapid Language Portability of Speech Processing Systems

Similar documents
Speech Recognition at ICSI: Broadcast News and beyond

ROSETTA STONE PRODUCT OVERVIEW

Cross Language Information Retrieval

Learning Methods in Multilingual Speech Recognition

Speech Translation for Triage of Emergency Phonecalls in Minority Languages

Busuu The Mobile App. Review by Musa Nushi & Homa Jenabzadeh, Introduction. 30 TESL Reporter 49 (2), pp

Text-to-Speech Application in Audio CASI

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

Berlitz Swedish-English Dictionary (Berlitz Bilingual Dictionaries) By Berlitz Guides

Investigation on Mandarin Broadcast News Speech Recognition

Letter-based speech synthesis

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Approved Foreign Language Courses

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

The Ohio State University. Colleges of the Arts and Sciences. Bachelor of Science Degree Requirements. The Aim of the Arts and Sciences

Tour. English Discoveries Online

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

EUROPEAN DAY OF LANGUAGES

Effect of Word Complexity on L2 Vocabulary Learning

Calibration of Confidence Measures in Speech Recognition

A Hybrid Text-To-Speech system for Afrikaans

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

English-German Medical Dictionary And Phrasebook By A.H. Zemback

Experiments with Cross-lingual Systems for Synthesis of Code-Mixed Text

A study of speaker adaptation for DNN-based speech synthesis

Florida Reading Endorsement Alignment Matrix Competency 1

Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing

What Can Twitter tell us about the language diversity of Greater Manchester?

Chapter 5: Language. Over 6,900 different languages worldwide

Language Center. Course Catalog

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Deep Neural Network Language Models

Linguistics 220 Phonology: distributions and the concept of the phoneme. John Alderete, Simon Fraser University

University of Thessaloniki, Greece Marina Mattheoudakis Associate Professor School of English, AUTh

Phonological Processing for Urdu Text to Speech System

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

CEFR Overall Illustrative English Proficiency Scales

Edinburgh Research Explorer

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Modeling full form lexica for Arabic

arxiv: v1 [cs.cl] 2 Apr 2017

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

Small-Vocabulary Speech Recognition for Resource- Scarce Languages

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

21st Century Community Learning Center

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

Using dialogue context to improve parsing performance in dialogue systems

Turkish Vocabulary Developer I / Vokabeltrainer I (Turkish Edition) By Katja Zehrfeld;Ali Akpinar

CEF, oral assessment and autonomous learning in daily college practice

A heuristic framework for pivot-based bilingual dictionary induction

Experience of Tandem at University: how can ICT help promote collaborative language learning between students of different mother tongues.

My First Spanish Phrases (Speak Another Language!) By Jill Kalz

Modeling function word errors in DNN-HMM based LVCSR systems

Constructing Parallel Corpus from Movie Subtitles

DETECTING RANDOM STRINGS; A LANGUAGE BASED APPROACH

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Rule Learning With Negation: Issues Regarding Effectiveness

On the Formation of Phoneme Categories in DNN Acoustic Models

Universal contrastive analysis as a learning principle in CAPT

Natural Language Processing. George Konidaris

Modeling function word errors in DNN-HMM based LVCSR systems

Switchboard Language Model Improvement with Conversational Data from Gigaword

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM

Speech Emotion Recognition Using Support Vector Machine

Open Discovery Space: Unique Resources just a click away! Andy Galloway

BYLINE [Heng Ji, Computer Science Department, New York University,

Problems of the Arabic OCR: New Attitudes

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Noisy SMS Machine Translation in Low-Density Languages

Basic German: CD/Book Package (LL(R) Complete Basic Courses) By Living Language

English Language and Applied Linguistics. Module Descriptions 2017/18

Circuit Simulators: A Revolutionary E-Learning Platform

Arabic Orthography vs. Arabic OCR

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

CS Machine Learning

Language Independent Passage Retrieval for Question Answering

DLM NYSED Enrollment File Layout for NYSAA

Finding, Hiring, and Directing e-learning Voices Harlan Hogan, E-learningvoices.com

Applications of memory-based natural language processing

Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore, India

Introduction. Beáta B. Megyesi. Uppsala University Department of Linguistics and Philology Introduction 1(48)

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Section V Reclassification of English Learners to Fluent English Proficient

From Empire to Twenty-First Century Britain: Economic and Political Development of Great Britain in the 19th and 20th Centuries 5HD391

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Laboratorio di Intelligenza Artificiale e Robotica

Language Model and Grammar Extraction Variation in Machine Translation

Dublin City Schools Broadcast Video I Graded Course of Study GRADES 9-12

A Neural Network GUI Tested on Text-To-Phoneme Mapping

REVIEW OF CONNECTED SPEECH

Mandarin Lexical Tone Recognition: The Gating Paradigm

Transcription:

Rapid Language Portability of Speech Processing Systems Tanja Schultz Language Technologies Institute, InterACT, Carnegie Mellon University MULTILING, Stellenbosch, April 10, 2006

Motivation Computerization: Speech is key technology Mobile Devices, Ubiquitous Information Access Globalization: Multilinguality More than 6900 Languages in the world Multiple official languages Europe has 20+ official languages South Africa has 11 official languages Speech Processing in multiple Languages Cross-cultural Human-Human Interaction Human-Machine Interface in mother tongue Rapid Language Portability, Tanja Schultz 2/33

Challenges Algorithms are language independent but require data Dozens of hours audio recordings and corresponding transcriptions Pronunciation dictionaries for large vocabularies (>100.000 words) Millions of words written text corpora in various domains in question Bilingual aligned text corpora BUT: Such data are only available in very few languages Audio data 40 languages, Transcriptions take up to 40x real time Large vocabulary pronunciation dictionaries 20 languages Small text corpora 100 languages, large corpora 30 languages Bilingual corpora in very few language pairs, pivot mostly English Additional complications: Combinatorical explosion (domain, speaking style, accent, dialect,...) Few native speakers at hand for minority (endangered) languages Languages without writing systems Rapid Language Portability, Tanja Schultz 3/33

Solution: Learning Systems Intelligent systems that learn a language from the user Effizient learning algorithms for speech processing Learning: Interactive learning with user in the loop Statistical modeling approaches Efficiency: Reduce amount of data (save time and costs): by a factor of 10 Speed up development cycles: days rather than months Rapid Language Adaptation from universal models Bridge the gap between language and technology experts Technology experts do not speak all languages in question Native users are not in control of the technology Rapid Language Portability, Tanja Schultz 4/33

SPICE Speech Processing: Interactive Creation and Evaluation toolkit National Science Foundation, Grant 10/2004, 3 years Principle Investigators Tanja Schultz and Alan Black Bridge the gap between technology experts language experts Automatic Speech Recognition (ASR), Machine Translation (MT), Text-to-Speech (TTS) Develop web-based intelligent systems Interactive Learning with user in the loop Rapid Adaptation of universal models to unseen languages SPICE webpage http://cmuspice.org Rapid Language Portability, Tanja Schultz 5/33

Rapid Language Portability, Tanja Schultz 6/33

Speech Processing Systems Phone set & Speech data Pronunciation rules Text data Hello Input: Speech hi /h//ai/ you /j/u/ we /w//i/ hi you you are I am AM Lex LM NLP / MT TTS Output: Speech & Text Rapid Language Portability, Tanja Schultz 7/33

Rapid Portability: Data Phone set & Speech data + Hello Input: Speech hi /h//ai/ you /j/u/ we /w//i/ hi you you are I am AM Lex LM NLP / MT TTS Output: Speech & Text Rapid Language Portability, Tanja Schultz 8/33

GlobalPhone Multilingual Database Widespread languages Native Speakers Uniform Data Broad Domain Large Text Resources Internet, Newspaper Corpus Arabic Croatian Turkish 19 Languages counting Ch-Mandarin Portuguese + Thai 1800 native speakers Ch-Shanghai German French Japanese Korean Russian Spanish Swedish Tamil Czech + Creole + Polish + Bulgarian +...??? 400 hrs Audio data Read Speech Filled pauses annotated Now available from ELRA!! Rapid Language Portability, Tanja Schultz 9/33

Speech Recognition in 17 Languages 40 33.5 30 20 10 29 29 18 19 20 20 20 21.723.4 16.9 14 14 14.514.5 10 11.8 Word Error Rate [%] 0 Japanese German English Thai Korean Ch-Mandarin Turkish French Portuguese Croatian Spanish Bulgarian Russian Afrikaans Chinese Arabic Iraqi Rapid Language Portability, Tanja Schultz 10/33

Rapid Portability: Acoustic Models Phone set & Speech data + Hello Input: Speech hi /h//ai/ you /j/u/ we /w//i/ hi you you are I am AM Lex LM NLP / MT TTS Output: Speech & Text Rapid Language Portability, Tanja Schultz 11/33

Universal Sound Inventory Speech Production is independent from Language 1) IPA-based Universal Sound Inventory IPA 2) Each sound class is trained by data sharing Reduction from 485 to 162 sound classes m,n,s,l appear in all 12 languages p,b,t,d,k,g,f and i,u,e,a,o in almost all Blaukraut Brautkleid Brotkorb Weinkarte k (0) lau k ra in k ar N k -1=Plosiv? J lau k ra ut k le ot k or in k ar +2=Vokal? N J k (1) k (2) ot k or ut k le Problem: Context of sounds are language specific Context dependent models for new languages? Solution: 1) Multilingual Decision Context Trees 2) Specialize decision tree by Adaptation Rapid Language Portability, Tanja Schultz 12/33

Rapid Portability: Acoustic Model 100 Ø Tree ML-Tree Po-Tree PDTS Word Error rate [%] 80 60 40 20 69,1 57,1 49,9 40,6 32,8 28,9 19,6 19 0 0 0:15 0:15 0:25 0:25 0:25 1:30 16:30 + Rapid Language Portability, Tanja Schultz 13/33

Projekt: SPICE Rapid Language Portability, Tanja Schultz 14/33

Rapid Portability: Pronunciation Dictionary Pronunciation rules Textdaten adios /a/ /d/ /i/ /o/ /s/ Hallo /h/ /a/ /l/ /o/ Phydough??? Hello Input: Speech hi /h//ai/ you /j/u/ we /w//i/ hi you you are I am AM Lex LM NLP / MT TTS Output: Speech & Text Rapid Language Portability, Tanja Schultz 15/33

Phoneme- vs Grapheme based ASR Word Error Rate [%] 50.0 40.0 30.0 20.0 10.0 11.5 Phoneme 26.8 24.5 19.2 18.4 Grapheme 15.6 14 12.7 Grapheme (FTT) 33 36.4 32.8 26.4 18.3 16 0.0 Problem: 1 Grapheme 1 Phoneme Flexible Tree Tying (FTT): One decision tree Improved parameter tying Less over specification Fewer inconsistencies English Spanish German Russian Thai AX-b AX-m 0=obstruent? 0=vowel? 0=begin-state? -1=syllabic?0=mid-state?-1=obstruent?0=end-state? Rapid Language Portability, Tanja Schultz 16/33 IX-m

Dictionary: Interactive Learning Word list W * Follow the work of Davel&Barnard Delete w i i:= best select Word w i Generate pronunciation P(w i ) TTS G-2-P Delete w i Update G-2-P * Word list: extract from text * G-2-P - explicit mapping rules - neural networks - decision trees - instance learning (grapheme context) Yes P(w i ) okay? No Improve P(w i ) * Update after each w i more effective training Lex Skip User Rapid Language Portability, Tanja Schultz 17/33

Rapid Language Portability, Tanja Schultz 18/33

Rapid Language Portability, Tanja Schultz 19/33

Issues and Challenges How to make best use of the human? Definition of successful completion Which words to present in what order How to be robust against mistakes Feedback that keeps users motivated to continue How many words to be solicited? G2P complexity depends on language 80% coverage hundred (SP) to thousands (EN) G2P rule system perplexity Language English Dutch German Afrikaans Italian Spanish Perplexity 50.11 16.80 16.70 11.48 3.52 1.21 Rapid Language Portability, Tanja Schultz 20/33

Rapid Portability: LM Resource rich languages Resource low languages: Inquiry Bridge Languages Internet / TV + Automatic Extraction LM Text data Hello Input: Speech hi /h//ai/ you /j/u/ we /w//i/ hi you you are I am AM Lex LM NLP / MT TTS Output: Speech & Text Rapid Language Portability, Tanja Schultz 21/33

Projekt: SPICE Rapid Language Portability, Tanja Schultz 22/33

Rapid Portability: TTS Phone set & Speech data Hello Input: Speech hi /h//ai/ you /j/u/ we /w//i/ hi you you are I am AM Lex LM NLP / MT TTS Output: Speech & Text Rapid Language Portability, Tanja Schultz 23/33

Parametric TTS Text-to-speech for G2P Learning: Technique: phoneme-by-phoneme concatenation, speech not natural but understandable (Marelie Davel) Units are based on IPA phoneme examples PRO: covers languages through simple adaptation CONS: not good enough for speech applications Text-to-speech for Applications: Common technologies Diphone: too hard to record and label Unit selection: too much to record and label New technology: clustergen trajectory synthesis Clusters representing context-dependent allophones PRO: can work with little speech (10 minutes) CONS: speech sounds buzzy, lacks natural prosody Rapid Language Portability, Tanja Schultz 24/33

SPICE: Afrikaans - English Goal: Build Afrikaans English Speech Translation System using SPICE Cooperation with University Stellenbosch and ARMSCOR Bilingual PhD visited CMU for 3 month (thanks Herman Engelbrecht!!!) Afrikaans: Related to Dutch and English, g-2-p very close, regular grammar, simple morphology SPICE, all components apply statistical modeling paradigm ASR: HMMs, N-gram LM (JRTk-ISL) MT: Statistical MT (SMT-ISL) TTS: Unit-Selection (Festival) Dictionary: G-2-P rules using CART decision trees Text: 39 hansards; 680k words; 43k bilingual aligned sentence pairs; Audio: 6 hours read speech; 10k utterances, telephone speech (AST) Rapid Language Portability, Tanja Schultz 25/33

SPICE: Time effort Good results: ASR 20% WER; MT A-E (E-A) Bleu 34.1 (34.7), Nist 7.6 (7.9) Shared pronunciation dictionaries (for ASR+TTS) and LM (for ASR+MT) Most time consuming process: data preparation reduce amount of data! Still too much expert knowledge required (e.g. ASR parameter tuning!) days 25 20 15 10 5 0 AM (ASR) Lex LM (ASR, MT) TM (MT) TTS S-2-S 11 3 5 8 7 5 5 Data Training Tuning Evaluation Prototype Rapid Language Portability, Tanja Schultz 26/33

Other Projects on Multilinguality Constantly growing interest in multilinguality Major needs: Information gathering from multiple sources Translation requirements for multilingual communities Two-way communication Translation of BN, Lectures, and Meetings US: GALE (DARPA), STR-Dust (NSF) Europe: TC_Star (EU FP6) Translation in mobile communication scenarios US: TransTac (DARPA), Thai ST (Laser) Rapid Language Portability, Tanja Schultz 27/33

Translation of Broadcast News, Lectures and Meetings Projects: TC_STAR (EC FP6) STR-DUST (NSF) Gale (DARPA) 你们的评估准则是什么 Demo Rapid Language Portability, Tanja Schultz 28/33

Gale: Global Autonomous Language Exploitation Largest DARPA project in HLT (EARS+TIDES) Automatically process huge volumes of speech and text data in multiple languages Broadcast News, Talk Shows, Telephone Conversations Chinese, Arabic (+ dialectal variations), surprise languages Deliver pertinent information in easy-to-understand forms to monolingual analysts, 3 engines: Transcription: Transform multilingual speech to text Translation: transform any text to English Distillation: extract & present information to English analyst Rapid Language Portability, Tanja Schultz 29/33

Demonstration Mandarin Broadcast News CCTV recorded in the US over satellite ASR SMT Transforming the Mandarin speech Into Chinese text using Automatic Speech Recognition Translating from Chinese text into English text using Statistical Machine Translation Rapid Language Portability, Tanja Schultz 30/33

PDA Speech Translation in Mobile Scenarios Tourism Needs in Foreign Country International Events Conferences Business Olympics Humanitarian Needs Humanitarian, Government Projects: Medical, Refugee Registration Thai ST (Laser) TransTac (DARPA) Rapid Language Portability, Tanja Schultz 31/33

Team effort: TransTac Speech Recognition (CMU / Mobile, LLC) Statistical MT (CMU / Mobile, LLC) Speech Synthesis Swift (Cepstral, LLC) Graphical User Interface (Mobile, LLC) System runs on all platforms Off-the-shelf consumer PDAs Laptop/Desktop under Win/CE/Linux Phraselator P2 (Voxtec) Interface Simple and intuitive push-to-talk Back translation for confirmation Language pairs: English-Thai + English-Arabic Handheld: Joint optimization of speed and accuracy About 1.5 real-time on a 800MHz PXA270, 128Mb RAM Rapid Language Portability, Tanja Schultz 32/33

Conclusion Intelligent systems to learn language SPICE: Learning by interaction with the (naive) user Rapid Portability to unseen languages Multilingual Systems Systems and data in multiple languages Universal language independent models Projects on Multilinguality Extract information from multilingual speech data Speech translation in mobile scenarios Rapid Language Portability, Tanja Schultz 33/33

Rapid Language Portability, Tanja Schultz 34/33

Rapid Language Portability, Tanja Schultz 35/33