ROLE OF POS TAGGING IN TEXT TO SPEECH SYNTHESIS. AJU SAMUEL THOMAS LDCIL, CIIL, MYSORE

Similar documents
have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Phonological Processing for Urdu Text to Speech System

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

Mandarin Lexical Tone Recognition: The Gating Paradigm

Speech Recognition at ICSI: Broadcast News and beyond

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Word Stress and Intonation: Introduction

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Speech Emotion Recognition Using Support Vector Machine

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

What the National Curriculum requires in reading at Y5 and Y6

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

English Language and Applied Linguistics. Module Descriptions 2017/18

Linking Task: Identifying authors and book titles in verbose queries

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Emmaus Lutheran School English Language Arts Curriculum

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

CEFR Overall Illustrative English Proficiency Scales

BULATS A2 WORDLIST 2

Phonological and Phonetic Representations: The Case of Neutralization

Myths, Legends, Fairytales and Novels (Writing a Letter)

The Acquisition of English Intonation by Native Greek Speakers

Natural Language Processing. George Konidaris

Character Stream Parsing of Mixed-lingual Text

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

CELTA. Syllabus and Assessment Guidelines. Third Edition. University of Cambridge ESOL Examinations 1 Hills Road Cambridge CB1 2EU United Kingdom

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

L1 Influence on L2 Intonation in Russian Speakers of English

Body-Conducted Speech Recognition and its Application to Speech Support System

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

SIE: Speech Enabled Interface for E-Learning

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

The influence of metrical constraints on direct imitation across French varieties

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Parsing of part-of-speech tagged Assamese Texts

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Letter-based speech synthesis

Cross Language Information Retrieval

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

Primary English Curriculum Framework

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

THE MULTIVOC TEXT-TO-SPEECH SYSTEM

Designing a Speech Corpus for Instance-based Spoken Language Generation

Automatic intonation assessment for computer aided language learning

Copyright 2002 by the McGraw-Hill Companies, Inc.

The College Board Redesigned SAT Grade 12

Rhythm-typology revisited.

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

MISSISSIPPI OCCUPATIONAL DIPLOMA EMPLOYMENT ENGLISH I: NINTH, TENTH, ELEVENTH AND TWELFTH GRADES

Consonants: articulation and transcription

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Building Text Corpus for Unit Selection Synthesis

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Getting the Story Right: Making Computer-Generated Stories More Entertaining

LING 329 : MORPHOLOGY

Alignment of Iowa Assessments, Form E to the Common Core State Standards Levels 5 6/Kindergarten. Standard

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Expressive speech synthesis: a review

Epping Elementary School Plan for Writing Instruction Fourth Grade

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Voice conversion through vector quantization

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM

TABE 9&10. Revised 8/2013- with reference to College and Career Readiness Standards

THE VERB ARGUMENT BROWSER

National Literacy and Numeracy Framework for years 3/4

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

The IRISA Text-To-Speech System for the Blizzard Challenge 2017

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Aviation English Solutions

REVIEW OF CONNECTED SPEECH

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Modeling function word errors in DNN-HMM based LVCSR systems

Formulaic Language and Fluency: ESL Teaching Applications

TEKS Comments Louisiana GLE

Part I. Figuring out how English works

Coast Academies Writing Framework Step 4. 1 of 7

Sample Goals and Benchmarks

Applications of memory-based natural language processing

Subject: Opening the American West. What are you teaching? Explorations of Lewis and Clark

Common Core State Standards for English Language Arts

Oakland Unified School District English/ Language Arts Course Syllabus

Reading Project. Happy reading and have an excellent summer!

Learning Methods in Multilingual Speech Recognition

5 Guidelines for Learning to Spell

Large Kindergarten Centers Icons

Journal of Phonetics

One Stop Shop For Educators

5 th Grade Language Arts Curriculum Map

age, Speech and Hearii

Discourse Structure in Spoken Language: Studies on Speech Corpora

The Structure of the ORD Speech Corpus of Russian Everyday Communication

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

Missouri GLE FIRST GRADE. Communication Arts Grade Level Expectations and Glossary

Transcription:

ROLE OF POS TAGGING IN TEXT TO SPEECH SYNTHESIS AJU SAMUEL THOMAS LDCIL, CIIL, MYSORE ajuthomas2008@gmail.com prsamthomas@gmail.com

INTRODUCTION POS Tagging is one of the essential parts in the processing of Natural Language. It is being used for many applications like developing Machine Translation, Information Retrieval etc.

POS Tagging: Definition The process of assigning a part-of-speech or lexical class marker to each word in a corpus WORDS the koala put the keys on the table TAGS N V P DET

As mentioned earlier, POS Tagging is the inevitable part in Natural Language processing. It is also applicable in the development of a Text to Speech System. How? It will be explained shortly.

What is a Text to Speech System? Automatic conversion of arbitrary or unrestricted natural language sentences from its text form into its spoken form.

A text-to-speech system must be Able to read anytext Intelligible Natural sounding

How does TTS work?

The NLP block: converts the input text into a sequence of sound units with a set of specifications Speech inventory: a database of sound units DSP block: `appropriate sound units from the speech inventory are selected and then concatenated to produce output speech (concatenative synthesis)

TTS Architecture Raw Text in Text Analysis Text Normalization Part-of-Speech tagging Homonym Disambiguation Phonetic Analysis Dictionary Lookup Grapheme-to-Phoneme (LTS) Prosodic Analysis Boundary placement Pitch accent assignment Duration computation Waveform synthesis Speech out

THE TEXT PROCESSING ASPECT OF SPEECH SYNTHESIS. Text processing breaks the input of Text to Speech Synthesis into units suitable for further processing. such as Expanding abbreviations, Part-of-speech (POS) tagging Letter-to-sound rules.

HOMOGRAPH DISAMBUGATION IN POS TAGGING FOR SPEECH SYTHESIS Every language does have homographs ie: words having same spelling, but having different pronunciations. POS tagging can better solve the problem of ambiguity here by giving the right tag according to the context in which a respective word occur. Eg: In English, there are some homographs whose pronunciation change according to the category. REcord record INsult insult

A morphological analysis can determine the part-of-speech (POS) information for many of the words, but some will have multiple possible POS categories. Without POS information, pronunciation might be ambiguous e.g. lives POS will also be used to predict the prosody.

The lexicon entries have three parts 1. Head word 2. POS 3. Phonemes The POS is sometimes necessary to distinguish homographs. HEAD POS PHONEMES LIVES NOUN l a iv z LIVES VERB l iv z

POS Tagging and Phrasing POS tagging also useful for CONTENT/FUNCTION distinction, which is useful for phrasing which helps the Text to Speech synthesis process a lot.

Suprasegmental aspects of Speech Synthesis. Prosody versus POS Tagging Prosody may reflect various features of the speaker or the utterance: The emotional state of the speaker. The form of the utterance (statement, question, or command) The presence of irony or sarcasm; emphasis, etc

In terms of acoustics, the prosody of languages involve variation in syllable length, loudness, pitch, and the formant frequencies of speech sounds. Generally, Orthographic conventions to mark for prosody include punctuation (commas, exclamation marks, question marks, scare quotes and ellipses)

In a corpus, punctuation, commas, exclamation marks, question marks, scare quotes and ellipses etc occur and it is quite natural. Even a quote can generate a meaning sometimes and that quote must be given importance while doing POS tagging. It will help the process of text to speech synthesis

Stress verses POS Tagging Stressis the relative emphasis that may be given to certain syllables in a word, Or to certain words in a phrase or sentence. The stress placed on syllables within words is calledword stressorlexical stress. The stress placed on words within sentences is called sentence stress.

In POS tagging the along with giving tags to each word, it is better to mark stress occurs in each word in a sentence. Sentence stress can also be marked. Though it is a tedious and time consuming job, it will cater to the needs of text to speech synthesis.

Tone versus POS Tagging Tone is usually used in language The tone here used is not at all related with the tone in the tonal languages. In communication, people use tones as part of expressing a concept and its usual. Rising Tone and Falling Tone are common in communication.

While doing POS tagging, it is better to give markers for rising and falling tone in the text. It will give naturalness to the speech in the process of converting the respective text into equivalent speech. Also the appropriate tone will occur in the output of Synthesis. Those who are doing tagging in their respective languages are supposed to know the tone of a given text according to the context since they are well versed in their mother tongue. He/Shecan give the markers of rising and falling tone accordingly.

TONE IN ACOUSTIC FORMAT The tone of an utterance has two components: The global pitch contour shape Localised pitch accents Most utterances show an overall downward trend in f0 called declination. We run out of breath, so air flow and pressure decrease and the vocal folds vibrate more slowly.

Tone: Base line and Top line Not only does the mean value of f0 decrease with time, the range does too.

Tone: Pitch accents

So while doing POS tagging, if one gives some markers for rising and falling tone, it will be very helpful in generating naturalness as it is, in the process of converting a text into speech.