Regional Winner paper in CSI-YITPA(E) 2002

Similar documents
Phonological Processing for Urdu Text to Speech System

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Learning Methods in Multilingual Speech Recognition

Mandarin Lexical Tone Recognition: The Gating Paradigm

A Hybrid Text-To-Speech system for Afrikaans

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Stages of Literacy Ros Lugg

THE MULTIVOC TEXT-TO-SPEECH SYSTEM

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Human Emotion Recognition From Speech

Large Kindergarten Centers Icons

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Speech Recognition at ICSI: Broadcast News and beyond

Word Segmentation of Off-line Handwritten Documents

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Florida Reading Endorsement Alignment Matrix Competency 1

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Modeling function word errors in DNN-HMM based LVCSR systems

SIE: Speech Enabled Interface for E-Learning

TEKS Comments Louisiana GLE

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Effect of Word Complexity on L2 Vocabulary Learning

The IRISA Text-To-Speech System for the Blizzard Challenge 2017

Test Blueprint. Grade 3 Reading English Standards of Learning

Body-Conducted Speech Recognition and its Application to Speech Support System

Building Text Corpus for Unit Selection Synthesis

Modeling function word errors in DNN-HMM based LVCSR systems

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

English Language and Applied Linguistics. Module Descriptions 2017/18

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Major Milestones, Team Activities, and Individual Deliverables

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

REVIEW OF CONNECTED SPEECH

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

SARDNET: A Self-Organizing Feature Map for Sequences

CEFR Overall Illustrative English Proficiency Scales

Arabic Orthography vs. Arabic OCR

Philosophy of Literacy Education. Becoming literate is a complex step by step process that begins at birth. The National

Primary English Curriculum Framework

Voice conversion through vector quantization

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Expressive speech synthesis: a review

Speech Emotion Recognition Using Support Vector Machine

Learning Methods for Fuzzy Systems

Loveland Schools Literacy Framework K-6

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Eyebrows in French talk-in-interaction

Parsing of part-of-speech tagged Assamese Texts

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Sample Goals and Benchmarks

Word Stress and Intonation: Introduction

Journal of Phonetics

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Implementing the English Language Arts Common Core State Standards

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM

Letter-based speech synthesis

Problems of the Arabic OCR: New Attitudes

Teacher: Mlle PERCHE Maeva High School: Lycée Charles Poncet, Cluses (74) Level: Seconde i.e year old students

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Rhythm-typology revisited.

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

Unit Selection Synthesis Using Long Non-Uniform Units and Phonemic Identity Matching

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

The Acquisition of English Intonation by Native Greek Speakers

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

Weave the Critical Literacy Strands and Build Student Confidence to Read! Part 2

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Circuit Simulators: A Revolutionary E-Learning Platform

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

English for Specific Purposes World ISSN Issue 34, Volume 12, 2012 TITLE:

ANGLAIS LANGUE SECONDE

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Name of Course: French 1 Middle School. Grade Level(s): 7 and 8 (half each) Unit 1

Grade 4. Common Core Adoption Process. (Unpacked Standards)

GACE Computer Science Assessment Test at a Glance

1/25/2012. Common Core Georgia Performance Standards Grade 4 English Language Arts. Andria Bunner Sallie Mills ELA Program Specialists

IEEE Proof Print Version

Common Core State Standards for English Language Arts

C a l i f o r n i a N o n c r e d i t a n d A d u l t E d u c a t i o n. E n g l i s h a s a S e c o n d L a n g u a g e M o d e l

Designing a Speech Corpus for Instance-based Spoken Language Generation

Transcription:

Regional Winner paper in CSI-YITPA(E) 2002 Bengali text-to-speech synthesis system, a novel approach for crossing literacy barrier Shyamal Kr. DasMandal & Barnali Pal Electronics Research & Development Center of India, Calcutta A Scientific Society of the Ministry of Communications and Information Technology, Government of India Plot E-2/1, Block GP, sector-v Bidhannagar, kolkata 700 091, India Abstract: In this age of information technology, information exchange methodologies, which overcome the barrier of human limitations, have gained importance. Since speech is a primary mode of communication among human beings, it is natural for people to expect to be able to carry out spoken dialogue with computers. This involves the integration of speech input/output technologies and language technologies. Speech synthesis is the automatic generation of artificial speech signal by the computer. In the last few years, this technology has been widely available for several languages for different platform ranging from personal computer to stand alone systems. If the vocabulary is very limited, very natural speech is possible by merely concatenating stored speech units. Most of voice response systems, such as paying bill through telephone, apply this method, which is much simpler than a real speech synthesizer. A true text-to-speech (TTS) system should be able to accept any input text in the chosen language including new words and typographical errors. This paper reports successful development of Text-to-Speech synthesis system of Bengali language at ER&DCI, Calcutta. The various steps involved and problems encountered in development of such solutions are highlighted. The paper concludes with the description and demonstration of reading newspaper on line from website, which is one of the typical application of this technology. This system can help to overcome the literacy barrier of common mass, can also empower the visually impaired population and increase the possibilities of improved man-machine interaction through on-line newspaper reading from Internet.

1 Introduction: Voice technology applications have created a growing demand for multi-lingual, multi-voice, multi-style speech synthesis system and first of all for a natural sounding voice, close to the quality of prerecorded speech. An unlimited continuous speech synthesizer is the one that can convert any given text into continuous speech within the realm of a language and is not restricted by vocabulary of syntax. There are many techniques available for speech synthesis like Formant synthesis, Concatinative synthesis, Articulacy synthesis etc. In case of our application we are using the concatinative approach. Concatenative model uses different length of prerecorded samples derived from natural speech, is probably the easiest way to produce intelligible and natural sounding synthetic speech. One of the most important aspects in concatenative synthesis is to find correct unit length of speech components. The selection is usually trade-off between longer and shorter units. With longer units high naturalness, less concatenation points and good control of co-articulation are achieved, but the amount of required units and memory is increased. With shorter units, less memory is needed, but the sample collecting and labeling procedures becomes more difficult and complex. In concatenative synthesis the speech units are usually words, syllables, demi-syllable, phonemes, and sometimes-even tri-phones. In the present system partneme are mainly used as units. The advantage of using partneme as the basic unit over all other is the simplicity of introducing intonation and prosodic rules into the synthesized speech signals. Using the above technology developed TTS system deliver a good quality speech out which can be deployed in many kind of application like online news paper reading from internet, overcome literacy barrier, empowering visual impaired population, enhancing other information system. 2 Bengali Text-to-Speech Synthesis system: Fig-1 gives a schematic block diagram of a Bengali TTS. The system consists of two main block a) Text analyzer b) Synthesizer. Text Analyzer Prosodic & Intonation information Bengali Text Text Analyzer Phonological Rules and Exceptional word list Phoneme String with Prosody & Intonation Parameter Phonetic Synthesizer Synthesizer Partneme Signal Dictionary Segmentation of the Speech signal Synthesized speech output

Fig.1 2.1 Text Analyzer The input text is essentially a string of characters, might be data from a word processor, standard ASCII or ISCII from e-mail, online newspaper text or a scanned text from newspapers. The first task faced by the text analyzer is the conversion of input text into linguistic representation i.e. grapheme-to-phoneme conversion. This conversion is highly language depended which required some language dependent phonological, prosodic intonation rules. Text containing digit & numerals are converted into full words based on number system rule. 2.2 Synthesizer In concatenative synthesis system speech is produced by taking the phoneme string and information for intonation and prosody as input. In this approach the quality of the output of synthesizer mainly depend on the quality of the information that contains in the basic building block i.e the partneme dictionary, which contains the part of phoneme as basic sound units. Partnemes includes vowel, consonant, consonant-vowel transition, vowel-consonant transitions and vowel-vowel transitions. The techniques, which have been implemented here is called Epoch Synchronous Non- Overlap Add method (ESNOLA) for concatenating these basic sound units to produce synthetic speech. While generating partnemes two major aspects are seen which are (i) Pitch of each units should remain same, ideally and (ii) Amplitude normalization is to be done depending on nature of vowel part of the signal To satisfy the first criteria, pitch detection and necessary modification is done. Intonation, stress, rhythm, duration etc. are called Prosodic or suprasegmental elements required for introducing naturalness into the synthesized speech. They are in turn related with fundamental frequency, segmental duration and also on complexity variation. To make computer speaking like human while reading a text, and it is necessary to make the computer understand the intended meaning, emotional and physical state of the speaker using some form of the artificial intelligence duration and specification of fundamental frequency contours we can introduce prosody in synthetic speech. 3 Applications of Bengali TTS system The developed speech synthesizer finds many applications. Some of are described as below: (i) Reading News paper from Internet. (ii) Conveying information to people over telephone or over local broadcasting system (village center). The information may be for arrival or Departure of train/plane, share or commodity prices, weather, name and address against any telephone no. etc. (iii) Children fable reading. (iv) Helping barely literate people or people not conversant with English, to receive information from computer. (v) Empowering visually impaired population. (vi) Reading e-mail over tele-phone/cell-phone.

3.1 Reading News paper from Internet Newspaper is a very important media to gather information of recent happening. If newspaper can be read out by means of a machine then that will help this objective very much. Our system is currently providing this facility in a very intelligible manner. It provides the following features for reading on-line Bengali newspaper from Internet. a) Headline reading by which one can listen only the headlines of the newspaper. b) Block wise reading of the news paper in the following fashion i) First block ii) Next block iii) Previous block. c) Stop and Resume of the reading Using hyperlink reader can also go to the relevant news details and can be easily readout by the synthesizer. The program first decodes the text information from the HTML source code of the newspaper from website, then it uses our Bengali text to speech synthesis system to readout the corresponding text information of the selected portion of the web page. The schematic block diagram of the integrated system for reading on-line newspaper from website is shown below. Newspaper site Down loading the news paper in client site Extracting text information from HTML source code Decoding the text information Bengali TTS system Synthesized output Fig.2 Block Diagram of Integrated system for on- line news paper reading 4 Conclusion The output of Bengali synthesizer developed by ER&DCI, Calcutta is fairly clear with good degree of phonetic naturalness. The integrated system for reading on line AnandaBazar Patrika will enable people to get information through Internet conveniently and efficiently. This will attract more people to come forward to take the benefit of IT enabled services.

This system will also become helpful for the physically impaired population, who cannot use speech as their primary means of communication. With some modifications this system may be used for other languages. Acknowledgements We thank to Mr. C. N. Ajit and Mr. A. Bandyopadhyay, ER&DCI, Calcutta for their valuable suggestions to prepare the document. References [1] A Bandyopadhyay, Some Important Aspects of Bengali Speech Synthesis System, The Indo-European Conference on Multilingual Communication Technologies (IEMCT), June 2002,Tata McGrow-Hill,pp. 95-100. [2] Dutoit, T., Introduction to text-to-speech synthesis, Kluwer academic Publishers,1997, Netherlands [3] Datta. A.K, Ganguly N R, Mukherjee B. Intonation in segment-concatenated-speech. Proc. ESCA Workshop on speech synthesis, Sep 1990, France, pp. 153-156.