A novel approach for Concatenative Speech Synthesis using Phonemic and Syllabic Transcription

Similar documents
Mandarin Lexical Tone Recognition: The Gating Paradigm

Speech Emotion Recognition Using Support Vector Machine

Learning Methods in Multilingual Speech Recognition

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Word Stress and Intonation: Introduction

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

First Grade Curriculum Highlights: In alignment with the Common Core Standards

A Hybrid Text-To-Speech system for Afrikaans

Expressive speech synthesis: a review

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Florida Reading Endorsement Alignment Matrix Competency 1

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Voice conversion through vector quantization

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Primary English Curriculum Framework

REVIEW OF CONNECTED SPEECH

Rendezvous with Comet Halley Next Generation of Science Standards

What the National Curriculum requires in reading at Y5 and Y6

Human Emotion Recognition From Speech

The ABCs of O-G. Materials Catalog. Skills Workbook. Lesson Plans for Teaching The Orton-Gillingham Approach in Reading and Spelling

Software Maintenance

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Body-Conducted Speech Recognition and its Application to Speech Support System

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

SARDNET: A Self-Organizing Feature Map for Sequences

Word Segmentation of Off-line Handwritten Documents

Speech Recognition at ICSI: Broadcast News and beyond

Understanding and Supporting Dyslexia Godstone Village School. January 2017

Disambiguation of Thai Personal Name from Online News Articles

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

CEFR Overall Illustrative English Proficiency Scales

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM

Problems of the Arabic OCR: New Attitudes

A study of speaker adaptation for DNN-based speech synthesis

Modeling function word errors in DNN-HMM based LVCSR systems

Data Fusion Models in WSNs: Comparison and Analysis

Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Correlated to Nebraska Reading/Writing Standards (Grade 10)

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

THE MULTIVOC TEXT-TO-SPEECH SYSTEM

Unit Selection Synthesis Using Long Non-Uniform Units and Phonemic Identity Matching

Coast Academies Writing Framework Step 4. 1 of 7

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

A Neural Network GUI Tested on Text-To-Phoneme Mapping

DIBELS Next BENCHMARK ASSESSMENTS

Emmaus Lutheran School English Language Arts Curriculum

ScienceDirect. Malayalam question answering system

/$ IEEE

Consonants: articulation and transcription

Letter-based speech synthesis

Test Blueprint. Grade 3 Reading English Standards of Learning

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Kings Local. School District s. Literacy Framework

On the Formation of Phoneme Categories in DNN Acoustic Models

Learning to Read and Spell Words:

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Phonological Processing for Urdu Text to Speech System

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles)

PHONETIC DISTANCE BASED ACCENT CLASSIFIER TO IDENTIFY PRONUNCIATION VARIANTS AND OOV WORDS

Modeling function word errors in DNN-HMM based LVCSR systems

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Linking Task: Identifying authors and book titles in verbose queries

Phonological and Phonetic Representations: The Case of Neutralization

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

The IRISA Text-To-Speech System for the Blizzard Challenge 2017

SIE: Speech Enabled Interface for E-Learning

Weave the Critical Literacy Strands and Build Student Confidence to Read! Part 2

Arabic Orthography vs. Arabic OCR

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Course Law Enforcement II. Unit I Careers in Law Enforcement

Rule Learning With Negation: Issues Regarding Effectiveness

Bachelor of Software Engineering: Emerging sustainable partnership with industry in ODL

Reducing Features to Improve Bug Prediction

Unit 9. Teacher Guide. k l m n o p q r s t u v w x y z. Kindergarten Core Knowledge Language Arts New York Edition Skills Strand

MARK 12 Reading II (Adaptive Remediation)

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

GOLD Objectives for Development & Learning: Birth Through Third Grade

Richardson, J., The Next Step in Guided Writing, Ohio Literacy Conference, 2010

Transcription:

International Journal of Scientific Research Organization Volume 1, Issue 1, Feb. 2017 Research Paper Available online at www.ijsro.com e-issn: 2456 6942 A novel approach for Concatenative Speech Synthesis using Phonemic and Syllabic Transcription Dr. P. Dhanalakshmi 1 Dept. of Computer Science and Engineering Annamalai University Chidambaram, India e-mail: abidhana01@gmail.com Dr. S. Ananthi 2 Dept. of Computer Science and Engineering Samskruti College of Engg. and Technology Hyderabad, India e-mail: ananthi68@gmail.com Abstract Speech Recognition and Speech Synthesis plays an imperative responsibility in Human-Machine Interaction. Synthesized speeches were extracted from concatenating the pieces of pre-recorded speech utterances from the database. The proposed work converts the written text into a phonemic and syllabic transcription. Consequently, it converts both representations into modified waveform clips that can be combined together to fabricate as sound. It attempts to describe the individual variations that occur between speakers of a dialect or language. The proposed system aims to record the phonemes and syllables that a speaker uses rather than the actual spoken variants of those sound that are produced when a speaker converse a word. Speech Synthesis is the progression of concatenating the pre-recorded utterance unit to produce speech. This work describes the enduring exertion in the field of Speech Synthesis based on concatenation of waveform units. The Concatenative Speech Synthesis method fabricates highly understandable speech utterance. Keywords Concatenative Speech Synthesis; Concatenate wave segments; Phoneme; Phonemic representation; Phonemic transcription; Syllable; Syllable transcription; Speech Processing; Speech Synthesis (SS); Text Normalization; Text to Speech (TTS) Conversion; Waveform Concatenation 1 INTRODUCTION In recent years, plenty of researches were developed in the field of Speech Processing and moreover there has been an enormous improvement in Speech Processing. Speech is the most natural and effective method of communication between human beings. It is not easy to quickly review, retrieve and reuse speech documents if they are simply recorded as an audio signal. Hence, transcribed speeches were expected to become an essential capability for the IT era. However, high recognition accuracy can be easily acquired for a speech from the text. Text to Speech (TTS) system converts regular language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. A computer system used for this intent is termed as Speech Synthesis. Speech Synthesis refers to the capability of the computer to reproduce the normalized text into machine generated speech. In general, Speech Synthesis functions as a medium which converts text into speech. Systems may differ in their size of the stored database depend on its requirements or applications. Recent inclination is to amass huge database of fluent speech. The effectual method for Text to Speech conversion is concatenation method. In this method, the system intends to select an optimal sequence of acoustic units at run time to synthesize a particular utterance. In this work, syllable utterances were considered because of its increasing obligatory. In SS, text document is given as input to the system which possibly consists of non-standard formats like digits/integers, numerical expressions, cardinal suffixes etc. Syllabic texts are extracted from the normalized form of the given input. Text normalization and Syllabic transcription and phonemic transcriptions are discussed in Section 2. 2 CONCATENATIVE SPEECH SYNTHESIS Text to Speech (TTS) Conversion system converts normal language of text into utterances. Concatenation Copyright 2016, IJSRO, All Rights Reserved Page 1

of speech synthesis can be created by concatenating pieces of recorded speech that are pre-recorded into the database, whereas converting the text into speech sound the system must generalize all the speech units uniquely. Any text may contain special elocution that should be stored in lexicon form. For this conversion the system uses text normalization which will be possessed internally within the synthesizer. Concatenative Speech Synthesis converts the written text into syllabic or phonemic representation and then it converts the generated syllabic or phonemic representation into waveform. These waveforms are combined to fabricate as sound. Framework of the proposed system is shown in Fig. 1. Text Input Text Normalization Unstructured text into structured text Syllabic Transcription Phonemic Transcription Actual utterance sound Speech unit can be either fixed size diphones or variable length units such as syllables and phones. Table 1: General Text Normalization Type Text Normalization Representation Digits/ Integer 0-9 Alphabetic character alpha_char a-z, A-Z Phrase or Sentence word_char <alpha_char> - _ 0-9 Cardinal suffix st, nd, rd, th Numerical Expression Integer expression, Cardinal suffix, Floating expression Open close brackets, equals, greater-than, dollar, percent, Pound sign, comma etc. are termed as Literal Symbols. Phrases in a context consist of literal symbols and numerical representation. The explanation for all the literals and numerical representation has to be declared initially for the conversion of non-structural to structural representation. Waveform Concatenation Prosody Generation 2.2 SYLLABLE Speech Combining waveform of the input text Figure 1: Architecture of Text to Speech Machine 2.1 TEXT NORMALIZATION Placing pauses in between speech Text to Speech Synthesizer mechanism works internally by synthesizing words. However, input text documents contains words also an assortment of written elements such as statistics, date, time, abbreviations, numbers, symbols etc., If the text consist of abbreviations and numbers then the system must determine how these non-standards should be read out. Any text that has a special pronunciation should be stored in lexicon such as Abbreviation, Acronyms, Special symbols etc. All diverse elements must be first converted into general or actual utterances and then only the system can synthesized as speech. Such conversion of diverse units into actual utterances which takes place internally within the synthesizer is expressed as Text Normalization. In general, Text Normalization is the conversion of text that includes non-standard word such as statistics, abbreviations, misspelling into normal words. Each literal symbols are recognized as units. These symbols should be separated from adjacent text with white space. A syllable is a basic unit of written language which consists of uninterrupted sound that can be used to form words. In other words, a syllable is the sound of a vowel that is created when pronouncing a word. The word vowel comes from the Latin word vocalis, which means vocal. According to phonetics, vowel is the pressure wave generated from the lungs and passed through vocal tract without any interruption. In English, A, E, I, O and U are the waves generated without any constriction by other vocal organ. 2.3 SYLLABIC TRANSCRIPTION COUNTING THE SYLLABLES: 1. Count the number of vowels present in the isolated word 2. Subtract the silent vowel from the word 3. Subtract one vowel from every diphthong that is consider diphthong as a single vowel. SYLLABIFICATION RULE: There are few rules for diving words into syllables. There are four ways to split up a word into its syllables: Rule 1: Divide the word between two middle consonants. Copyright 2016, IJSRO, All Rights Reserved Page 2

Rule 2: Usually divide before a single middle consonant. Rule 3: Divide before the consonant before an "-le" syllable. Rule 4: Divide off any compound words, prefixes, suffixes and roots which have vowel sounds. According to rule 1, split up words which have two middle consonants. For example consider the following words hap/pen, let/ter, din/ner. Happen consist of two consonants pp at the middle of the word, similarly for letter and dinner also it has two middle consonants. The only exceptions are the consonant digraphs. Never split up consonant digraphs as they really represent only one sound. The exceptions are "th", "sh", "ph", "th", "ch", and "wh". According to Rule 2, when there is only one syllable, it usually divides in front of it. "o/pen", "i/tem", "e/vil", and "re/port" are few examples for Rule2. The only exceptions in these are those times when the first syllable has an obvious short sound, as in "cab/in". Based on Rule 3, a word that has the old-style spelling in which the "-le" sounds like "-el", divide before the consonant before the "-le". For example: "a/ble", "fum/ble", "rub/ble" "mum/ble" and "thi/stle". The only exceptions in these are "ckle" words like "tick/le". According to Rule 4, the word is Split off into parts of compound words like "sports/car" and "house/boat". Divide off prefixes like "un/happy", "pre/paid", or "re/write". Also divide off suffixes as in the words "farm/er", "teach/er", "hope/less" and "care/ful". In the word "stop/ping", the suffix is actually "-ping" because this word follows the rule that when it is added with "- ing" to a word with one syllable, it doubles the last consonant and add the "-ing". For instance consider the word Repair: In the chosen example, there exist three vowels (e, a and i) hence the count of vowels are three. The vowel a and i is placed together to form a single sound. Hence it is clear that ai is diphthong so, it is considered as a single vowel and now the count of vowel is reduced into two. As a result the above example Repair consists of two vowels. After counting the vowel, proceed to divide the word into syllables. According to Rule 4, there exist prefix in the word. Hence it is separated as a single syllable Re. There should be only two syllables because of the count of vowels. So the process has been terminated. Finally, there are two syllables Re and pair (Re/pair) respectively. =>Re/pair For another instance consider the word Indian Syllable count: Number of Vowels -> 2 (i, i), is a diphthong so consider as single vowel sound i. Therefore, Syllable count -> 2 Dividing word into syllable: Find whether any prefix/ suffix present in the word and separate it as syllable. There exist prefix in in the word Indian, so separate it as syllable in/dian there are only two syllables so assign remaining characters into another syllable. =>In/dian 2.4 PHONEMIC TRANSCRIPTION Phonemic transcription attempts to depict the individual dissimilarity that arises between speakers of a language. Phonemic/phonetic transcription aspire to record the phonemes that a speaker utilize rather than the real spoken variants of those phonemes that are created when a speaker converse an utterance. A phoneme is an intangible linguistic unit that survives entirely in the brain of a speech producer; they could be symbolized by any arbitrary classification of symbols. The most widely accepted classification of symbols is the International Phonetic Alphabet (IPA). This International Phonetic Alphabet is used to characterize mutually phonemes and allophones in usual observe although it is described in terms of actual utterance sounds. Each Phoneme is a group of sound that actually uttered. These phonemic transcriptions are used by the speech synthesis system for the conversion of Text to Speech. Table 2: Phonemic Transcription Standard Representation Phonemic Transcription to tə is iz text tekst Copyright 2016, IJSRO, All Rights Reserved Page 3

cat kæt here hir speech spiːtʃ conversion kənˈvərʒən transcription trænˈskrɪpʃən 2.5 PROSODY Prosody is the pitch, volume and speed that words, sentences and phrases are spoken with. The system may also need to split the input into smaller chunks of output text to determine which words needs to be emphasized. The term prosody refers to both pitch and the placement of pauses in between speech for making synthetic natural speech sound. It is very easy for a human speaker to pause at any places in speech, but it s complicated for the machine to fabricate sounds. Prosody plays an important role in guiding listener for speaker attitude towards the message [1]. Prosody consists of systematic perception and recovery of speaker intentions based on Pauses, Pitch, Rate and Loudness [2]. Pauses are used to indicate phrases and separate the two words. Pitch refers the rate of vocal fold cycle as function of time. Rate denotes Syllable duration and time and Loudness represents the relative amplitude or volume. Initially, the engine identifies the beginning and ending of sentences. The pitch will be likely to fall near the end of a statement and rise for a question. Similarly, the machines starts speaking either phoneme or syllable and it fall on to the last word, and then pauses are placed in between the sentences for clear reading. Hence, prosodic system for Text to Speech must provide suitable pauses for the speech. It also provides adequate information to make the pitch sound realistic. All the Text to Speech Engines have to convert the list of syllables and their volume, pitch and duration into digital audio. TTS Engine generates the digital audio by concatenating pre-recorded syllables or phonemes which are stored in a database. The combined pre-recorded smallest unit of sound is given to the TTS Engines which speaks the sentence loud. Duration: Rule based method is commonly used method for computing duration for the phrase or sentence. Time duration between the sentences or phones decides the clarity of the speech so that durational assignment plays a vital role in text to speech conversion. Each Phone is pronounced in various duration by the user. The duration of phone d is expressed as Where, ( ) d = Average duration of the phone dmin = Minimum duration of the phone r = correction Pauses and Pitch: Pauses are mainly used in running text which is generated in the form of utterance output. In usual systems, the reliable location which is designate to insert pause is the pronunciation symbols [3]. For every Full stop or commas in a sentence the pause has to be placed for absolute reading of phrase in between those sentences. That is text may allow pausing for some duration while it found comma or Full stop in the sentence. So silence sound Sil is placed for few milliseconds then Connection C has to be made for the continuation for the phrases. Generally, Speech Synthesis engines need to express their usual pitch patterns within the broad limits specified by a Pitch markup. Tone: The prosodic parameters in tones are used to generate the voice output. The tone is determined by calculating TILT. TILT is directly calculated from f0. From acoustic aspect, it is directly represented by the shape of fundamental frequency (F0) contour. The tone shape is represented by, Where, Arise =Amplitude of rise (in Hz) Afall =Amplitude of fall (in Hz) 3 WAVEFORM CONCATENATION Concatenative TTS System produces very natural sounding speech. Since, they simply join prerecorded segment or units to form sentences. Speech generated by this approach inherently possesses natural quality. The system generates speech by searching for appropriate combinations of sound in a large database of human speech. Copyright 2016, IJSRO, All Rights Reserved Page 4

The required syllabic fragments are found in database and hence joining or modifying certain speech unit can be avoided. The best combinations are found and they are concatenated. The following steps were followed by the Synthesis machine to convert text into speech. Step 1: The text dialogue is normalized as well as sentences were converted into words. Step 2: Syllabic text representation is obtained from the normalized text which fall under 44 syllabic sound categories. Step 3: Syllabic text representation into syllabic clips which are selected form speech database. Step 4: The syllabic clips duration, pitch are changed according to their respective position. Expression Type Step 5: The modified syllabic clips are concatenated to form the individual words. Step 6: These isolated words are combined with respective paused within each word. 4 EXPERIMENTAL RESULTS The quality of the synthetic voice is measured by voice generated by the system formal listening tests. Table 2 shows the sample conversion of normal structure to normalized structure. In initial stage nonstructural to structural conversion was performed using the concept of text normalization. An example for text normalization is shown in Table 2. Table 2: Text Normalization for non-structure representation Normal Text (Unstructured) Date 06/08/1988 Tel Number Vehicle Number Time 9876543210 TN 45 Hour: Minute: sec 12:53:49 Initially, the given input is normalized and sentences were broken into the words. Further, syllables present in the words are analyzed and their appropriate positions are also analyzed by the system for syllabic transcription and vice versa for phonemic transcription. After finding the syllables or phonemic positions, it searches for its appropriate sound from the database by finding the position of the pronounced syllable. Syllable positions present in the database were 17 (for the syllable wave ) and 8 (for the syllable form ) which is shown in Fig. 4. Finally, syllable sounds were combined together to form the speech utterance. A set of 100 sentences were selected as a training set from the database. Training is given for each and every data used in the training. A testing data is chosen from the database which are not included or trained in the training set. Normalized Text (Structured) Sixth August Nineteen Eighty Eight 6<digit>, th <Cardinal suffix>,august<word>,1988 <digit> Tel{Nine, Eight, Seven, Six, Five, Four, Three, Two, One, Zero} Individual Numbers has to be read (int) T<word_char>, N<word_char> Fourty Five< digit> Time{Twelve Hours Fifty Three Minutes and Fourty seconds} A set of 75 sentences were selected from the speech database which is not tested in the training set. Rank of the voice quality decides the quality of the synthetic voice. 20 persons were chosen for testing the converted speech. The average scores given by these persons were considered for ranking this score and this score is termed by Mean Opinion Score (MOS). Three categories of speeches were considered for measuring the quality of speech. Speech synthesis using 2 minutes, 3 minutes and 5 minutes of speech were heard by different users. The quality of the speech is measured by the rank given by these 20 users. Rank is measured based on the score of 4 different scales. Copyright 2016, IJSRO, All Rights Reserved Page 5

5 4 3 2 1 0 Figure 7: Mean Opinion Score for Synthesized Speech The applications of Text to Speech conversions are Telecommunications Information system Visual disability Language teaching Analysis by synthesis of pathological voices 5 CONCLUSION MOS 2 Mins 3 Mins 5 Mins Precision Naturalness Average Pleasant Speech Synthesis plays a vital role in Human- Machine Interaction. Concatenative Speech Synthesis method produces extremely widespread synthesis method. Huge collection of data s were collected and stored in a database for further practice. Synthesized speech can be extracted by concatenating pieces of recorded speech that are stored in database. Speech Synthesis based on concatenative method generates the utterance of speech which is similar to human voice. Based upon the speed, accuracy and similarity between human voice the quality of the synthesized speech is analyzed. Naturalness, precision and pleasantness were also analyzed, which are the most commonly used criteria for high-quality speech. REFERENCES [1] Atal, B. S., and Hanauer, S. L., Speech analysis and synthesis by linear prediction of the speech wave, the Journal of the Acoustical Society of America, 50, (1971), 637 655. [2] Badin, P., and Fant, G., Notes on vocal tract computation (Tech. rep.), STL-QPSR, (1984). [3] Banks, G. F and Hoaglin, L. W., An experimental study of duration characteristics of voice during the expression of emotion, Speech Monographs, 8, (1941), 85 90. [4] Björn Schuller, Zixing Zhang, Felix Weninger and Felix Burkhardt: Synthesized speech for model training in cross-corpus recognition of human emotion, Int J Speech Technol, 15 (2012), :313 323. [5] Campbell, N., Developments in corpus-based speech synthesis: approaching natural conversational speech, IEICE Transactions, 87, (2004), 497 500. [6] Campbell N., Conversational speech synthesis and the need for some laughter, IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 2006, 1171 1179. [7] Campell, N., Hamza, W., Hog, H and Tao, J., Editorial special section on expressive speech synthesis, IEEE Transactions on Audio, Speech, and Language Processing, 14, (2006), 1097 1098. [8] Carlson, R., Sigvardson, T and Sjolander, A.: Data-driven formant synthesis (Tech. rep.), TMH-QPSR, (2008). [9] Clark, R. A. J., Richmond, K and King, S.: Multisyn: opendomain unit selection for the Festival speech synthesis system Speech Communication, 49, 317 330. [10] Courbon, J. L and Emerald, F.: A text to speech machine by synthesis from diphones, in Proc. ICASSP, PTR: Upper Saddle River, NJ, (2002). [11] Jong Kuk Kim, Hern Soo Hahn and Myung Jin Bae: On a Speech Multiple System Implementation for Speech Synthesis, Wireless Pers Commun, 49: (2009), 533 543. [12] Linear Predictive Speech Processing, http://www.iua.upf.es/~xserra/cursos/ TDP/referencies/Park-LPC-tutorial.pdf [13] Mahwash Ahmed and Shibli Nisar: Text-to- Speech Synthesis using Phoneme Concatenation, International Journal of Scientific Engineering and Technology, Vol. No.3(2),2014, 193-197. [14] Speech Processing: Theory of LPC Analysis and Synthesis, http://cnx.org/content/m10482/2.18. [15] "Vowel". Online Etymology dictionary. Retrieved 21 st November 2013. [16] http://www.etymonline.com/index.php?allowed _in_frame=0&search=vowel&searchmode=nl [17] S. Ananthi and P. Dhanalakshmi, Syllable based concatenative synthesis for text to speech conversion, Computational Intelligence in Data Mining, Springer India Publishing, vol. 3, pp. 65-73, January 2014.