A user friendly translation system for first responders PTC Research Project

Similar documents
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Top US Tech Talent for the Top China Tech Company

Learning Methods in Multilingual Speech Recognition

A study of speaker adaptation for DNN-based speech synthesis

Speech Recognition at ICSI: Broadcast News and beyond

Evaluation of Learning Management System software. Part II of LMS Evaluation

Speech Translation for Triage of Emergency Phonecalls in Minority Languages

Specification of the Verity Learning Companion and Self-Assessment Tool

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

A Pilot Study on Pearson s Interactive Science 2011 Program

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Small-Vocabulary Speech Recognition for Resource- Scarce Languages

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Using Moodle in ESOL Writing Classes

Designing Educational Computer Games to Enhance Teaching and Learning

Word Segmentation of Off-line Handwritten Documents

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Experiments with Cross-lingual Systems for Synthesis of Code-Mixed Text

Nottingham Trent University Course Specification

A Hybrid Text-To-Speech system for Afrikaans

Carnegie Mellon University Department of Computer Science /615 - Database Applications C. Faloutsos & A. Pavlo, Spring 2014.

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Bluetooth mlearning Applications for the Classroom of the Future

Cross Language Information Retrieval

Execution Plan for Software Engineering Education in Taiwan

Modeling function word errors in DNN-HMM based LVCSR systems

Second Annual FedEx Award for Innovations in Disaster Preparedness Submission Form I. Contact Information

Online Marking of Essay-type Assignments

Group A Lecture 1. Future suite of learning resources. How will these be created?

Visit us at:

COVER SHEET. This is the author version of article published as:

Letter-based speech synthesis

The open source development model has unique characteristics that make it in some

Infrared Paper Dryer Control Scheme

Text-to-Speech Application in Audio CASI

Bluetooth mlearning Applications for the Classroom of the Future

CWIS 23,3. Nikolaos Avouris Human Computer Interaction Group, University of Patras, Patras, Greece

ESTABLISHING A TRAINING ACADEMY. Betsy Redfern MWH Americas, Inc. 380 Interlocken Crescent, Suite 200 Broomfield, CO

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Radius STEM Readiness TM

Modeling function word errors in DNN-HMM based LVCSR systems

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Characteristics of the Text Genre Realistic fi ction Text Structure

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Enter the World of Polling, Survey &

What is a Mental Model?

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Calibration of Confidence Measures in Speech Recognition

TotalLMS. Getting Started with SumTotal: Learner Mode

Problems of the Arabic OCR: New Attitudes

Evidence for Reliability, Validity and Learning Effectiveness

Computerized Adaptive Psychological Testing A Personalisation Perspective

OPAC and User Perception in Law University Libraries in the Karnataka: A Study

Handbook for Graduate Students in TESL and Applied Linguistics Programs

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Parsing of part-of-speech tagged Assamese Texts

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

Connect Communicate Collaborate. Transform your organisation with Promethean s interactive collaboration solutions

Applications of memory-based natural language processing

DfEE/DATA CAD/CAM in Schools Initiative - A Success Story so Far

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

The IDN Variant Issues Project: A Study of Issues Related to the Delegation of IDN Variant TLDs. 20 April 2011

Edinburgh Research Explorer

IBM Software Group. Mastering Requirements Management with Use Cases Module 6: Define the System

Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing

UCEAS: User-centred Evaluations of Adaptive Systems

Speech Emotion Recognition Using Support Vector Machine

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Distributed Weather Net: Wireless Sensor Network Supported Inquiry-Based Learning

Lectora a Complete elearning Solution

BOS 3001, Fundamentals of Occupational Safety and Health Course Syllabus. Course Description. Course Textbook. Course Learning Outcomes.

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Briefing document CII Continuing Professional Development (CPD) scheme.

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Automating Outcome Based Assessment

ITE and PSA Launched Specialist Nitec Course Initiative to provide structured course for ITE graduates to sharpen their skills in port equipment

Investigation on Mandarin Broadcast News Speech Recognition

Language Model and Grammar Extraction Variation in Machine Translation

Getting Started with Deliberate Practice

The Political Engagement Activity Student Guide

Document number: 2013/ Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering

Android App Development for Beginners

Getting the Story Right: Making Computer-Generated Stories More Entertaining

PeopleSoft Human Capital Management 9.2 (through Update Image 23) Hardware and Software Requirements

Abstract. Janaka Jayalath Director / Information Systems, Tertiary and Vocational Education Commission, Sri Lanka.

Laboratorio di Intelligenza Artificiale e Robotica

Five Challenges for the Collaborative Classroom and How to Solve Them

The Transformation Agenda Johtaminen digitaalisessa murroksessa Ari Lampela, Johtaja, Pilvi-liiketoiminta. Speech to Text

Different Requirements Gathering Techniques and Issues. Javaria Mushtaq

From Self Hosted to SaaS Our Journey (LEC107648)

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Use of CIM in AEP Enterprise Architecture. Randy Lowe Director, Enterprise Architecture October 24, 2012

Transcription:

Humanitarian Babel Fish A user friendly translation system for first responders PTC Research Project

US English Proof of Concept Cebuano Audio In Audio Out Automatic Speech Recognition Text to Speech Synthesis Machine Translation Machine Translation Text to Speech Synthesis Automatic Speech Recognition Audio Out Audio In Fully on board speech-to-speech translation system to facilitate cross-language communication in disaster relief scenarios

Sponsors and Stakeholders PTC Honolulu PTC Australia Vonwiller Foundation Open Systems Education Trust Kirby Foundation Pro bono contributors Dr Julie Vonwiller, Project Manager Dr James Nealand, Project Advisor RedR Australia Habitat for Humanity Australia Project Steering Committee

Task 1 Project Tasks Identify specific domains, and develop scenarios and vocabulary Task 2 Collect and annotate databases of Cebuano in-domain audio and Cebuano & English text; build matching Cebuano & English lexicons APPEN Tasks 3 & 4 Develop CEB speech recognition module ASR Develop US ENG speech recognition module Task 5 & 6 Develop CEB <> ENG translation modules Task 7 Develop CEB synthesis module Task 8 Integrate US ENG synthesis module Task 9 Develop basic user interface and integrate with modules Task 10 Conduct trials and final testing

Domains Different operational stages can be classified in language processing as domains Examples include Medical & Health, Water & Sanitation, Security For this project, we selected the Needs Assessment domain. The first activity undertaken by relief workers to gain a rapid overview of the situation, in order to efficiently manage the response to the disaster Speech recognition and Machine Translation can be optimised by training the computer algorithms on the scenarios, dialogue and vocabulary of actual disaster relief operations Tune language models the specific domain Ensure that all relevant vocabulary is covered by the system Ensure that the expected topics and vocabulary are well covered by parallel corpora used to build machine translation Actual scenarios and training data for the Needs Assessment domain was captured by participation in RedR s training programs for relief workers

A Cebuano language training corpus was collected by Appen over four months in the Philippines in the Visayan region 100 native speakers, balanced by age and gender Speakers were recorded in role playing scenarios as disaster victims and first responders The recording environment and background noise matched likely field conditions Processing the data post-recording was carried out by Appen in Davao and Sydney Data Collection

Automatic Speech Recognition (ASR) Development of ASR components was performed by Assistant Professor Khe Chai Sim of the National University of Singapore The open source package Pocketsphinxfrom Carnegie Mellon University was selected as the ASR engine: Full tool-chain available for developing new language components Supports fast model architectures suitable for embedded systems runs completely on board Existing Android build available Speech recognition relies on two main components for a language: the acoustic model and language model For US English a standard acoustic model released by CMU was used, and a custom, domain specific language model built. For Cebuano, both a custom acoustic model and language model were built. All processing is performed on board with no reliance on network connectivity. Instances of the decoder in both languages are kept loaded so that there is no latency in switching between languages.

Machine Translation (MT) Text-to-Text translation has been developed for both directions: Cebuano English English Cebuano The Open Source MOSES software package was selected: Typically server based software, to the best of our knowledge we are the first to have ported Moses to Android Moses model development was performed at Carnegie Mellon University by Andrew Wilkinson The training data consisted of parallel text corpora (English and Cebuano), derived from the domainspecific recordings and general Cebuano text The MT package is by far the largest component in the system. As such, we operated the MT software as a separate process and communicate with the application using XML-RPC.

Speech Synthesis (TTS) The Text-To-Speech system uses the CMU Festival Lite (Flite) system The English TTS is an existing US English voice The Cebuano TTS was created by Prof Alan Black and Andrew Wilkinson of CMU, using training data provided by Appen New voices are added by copying the voice file to micro-sd The training data included high-quality recordings of 2000 phonetically rich sentences by a female Cebuano voice talent, supplemented by phonemic lexicons of general and domainspecific Cebuano words An existing build of Festival Lite for Android existed, however we modified the Java Native Interface (JNI) to add a number of features: Faster switching between voices essentially two voices are kept loaded simultaneously Voices can be added by copying files to micro-sd

Integration The application is a regular Android application (shipped as a.apk file). ASR and TTS packages are built as shared libraries with a Java Native Interface (JNI) layer for communication with the application. Existing builds of Pocketsphinx and Festival Lite were available for Android so these were modified as needed Moses runs as a separate process and communicates with the application using XML Remote Procedure Calls (XML-RPC). The components of Moses needed for our application were ported to Android. To our knowledge we are the first to build Moses for Android ASR, TTS, and MT models are shipped as a set of files on a micro-sd card. Logs are written to micro- SD Convenient during development and trials Ultimately we will add the ability fetch components from a remote server and also to store components on internal flash The User Interface presents unique challenges Two users interacting with the system in different languages While the aid worker can be trained to use the application, the victim will be seeing an unfamiliar system for the first time.

Modular Performance ASR On a portion of the Appen corpus held out from training, ASR word accuracy of 80.31% was achieved for US English and 85.39% for Cebuano English speakers were not native US English speakers Preliminary experiments show that we can achieve a 2% absolute improvement in word accuracy using Maximum Likelihood Linear Regression adaptation. This would require users to engage in a short enrolment session MT English -> Cebuano BLEU 38.7% Cebuano -> English BLEU 47.9% TTS Mean Opinion Score (MOS) of 4.48 (out of 5), which is well within the target performance. The Word Error Rate was assessed using semantically unpredictable sentences (SUS) played to a native Cebuano speaker, who was required to write down what he heard. The SUS word error score was 4.5%, also within the target error rate of <30%

The Future End-End performance testing Using pre-recorded audio as well as live field tests Software engineering Improvement of user interface Complete port of Moses to enable use of models in binary form Add support for speaker adaptation Languages and Domains Text and Audio data collection in an additional domains and languages Prove that the recipe developed can be used to rapidly build out new languages and domains Partnerships and Deployment