CHAPTERl INTRODUCTION

Similar documents
Word Stress and Intonation: Introduction

Speech Emotion Recognition Using Support Vector Machine

Rhythm-typology revisited.

English Language and Applied Linguistics. Module Descriptions 2017/18

Consonants: articulation and transcription

Speech Recognition at ICSI: Broadcast News and beyond

Mandarin Lexical Tone Recognition: The Gating Paradigm

L1 Influence on L2 Intonation in Russian Speakers of English

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Proceedings of Meetings on Acoustics

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Automatic intonation assessment for computer aided language learning

REVIEW OF CONNECTED SPEECH

Speaker Recognition. Speaker Diarization and Identification

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

The Acquisition of English Intonation by Native Greek Speakers

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

IEEE Proof Print Version

Eyebrows in French talk-in-interaction

An Acoustic Phonetic Account of the Production of Word-Final /z/s in Central Minnesota English

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

Voice conversion through vector quantization

Stages of Literacy Ros Lugg

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Human Emotion Recognition From Speech

The IRISA Text-To-Speech System for the Blizzard Challenge 2017

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY

Phonetics. The Sound of Language

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Modeling function word errors in DNN-HMM based LVCSR systems

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Annotation Pro. annotation of linguistic and paralinguistic features in speech. Katarzyna Klessa. Phon&Phon meeting

The influence of metrical constraints on direct imitation across French varieties

PRAAT ON THE WEB AN UPGRADE OF PRAAT FOR SEMI-AUTOMATIC SPEECH ANNOTATION

Body-Conducted Speech Recognition and its Application to Speech Support System

Lecture Notes in Artificial Intelligence 4343

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Segregation of Unvoiced Speech from Nonspeech Interference

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Getting the Story Right: Making Computer-Generated Stories More Entertaining

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

CEFR Overall Illustrative English Proficiency Scales

Using dialogue context to improve parsing performance in dialogue systems

Modeling function word errors in DNN-HMM based LVCSR systems

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Course Law Enforcement II. Unit I Careers in Law Enforcement

A study of speaker adaptation for DNN-based speech synthesis

Degeneracy results in canalisation of language structure: A computational model of word learning

A survey of intonation systems

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Word Segmentation of Off-line Handwritten Documents

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Speaker Identification by Comparison of Smart Methods. Abstract

Data Fusion Models in WSNs: Comparison and Analysis

SARDNET: A Self-Organizing Feature Map for Sequences

DEVELOPING A PROTOTYPE OF SUPPLEMENTARY MATERIAL FOR VOCABULARY FOR THE THIRD GRADERS OF ELEMENTARY SCHOOLS

Journal of Phonetics

English intonation patterns expressing politeness and their cross-language perception

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Evaluation of Various Methods to Calculate the EGG Contact Quotient

Phonological and Phonetic Representations: The Case of Neutralization

Sound and Meaning in Auditory Data Display

CELTA. Syllabus and Assessment Guidelines. Third Edition. University of Cambridge ESOL Examinations 1 Hills Road Cambridge CB1 2EU United Kingdom

Fluency Disorders. Kenneth J. Logan, PhD, CCC-SLP

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Speaker recognition using universal background model on YOHO database

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

A Case Study: News Classification Based on Term Frequency

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Lower and Upper Secondary

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade

On the Formation of Phoneme Categories in DNN Acoustic Models

Corpus Linguistics (L615)

General syllabus for third-cycle courses and study programmes in

Florida Reading Endorsement Alignment Matrix Competency 1

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

Fountas-Pinnell Level P Informational Text

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

Language Development: The Components of Language. How Children Develop. Chapter 6

Transcription:

CHAPTERl INTRODUCTION

1. INTRODUCTION The multifaceted system of speech involves different discipline of subjects in which its scientific study of speech science is one ofthe challenging tasks. Speech in the form of acoustic wave form transforms the linguistic message from a speaker to the hearer. In such a way, human beings depend on the speech as well as the text for their day to day communication. Speech is a complex process in which it shows the fluctuating values of speech parameters. Due to the dynamic nature of speech, it is observed that even a single speaker cannot utter the same word twice in a same manner. It is also somehow difficult to analyze the speech parameters and to get the static values for the theoretic explanation of speech. Therefore, if someone may try to do the scientific study of speech, it requires much more attention and also this study needs the subject knowledge of different disciplines likes acoustics, psychology, statistics, computer science etc. Speech conveys the information of speaker's intentional message, speaker's attitude oflife and the exact content of the discussed subject. Moreover, speech is considered as the primary mode of human communication. It is an amazing fact that there is no specific organ for the production of speech. The human organs like lungs, larynx, tongue, nose, lips and teeth have a primary function to support the life by breathing, tasting and eating. As it is concerned with the human speech production and perception, one can raise a question that why the speech is a highly complicated process in biological perspectives. Even a single speaker's utterance in different time will vary due to various factors likes state of the mind, emotions etc. Another important question is that how we perceive or recognize the speech units in the multifaceted system of speech. The scientific explanation is that the pressure fluctuation in the surrounding air makes the vibration in the eardrum. The middle ear can transform this vibrational energy into the inner ear. In such a way, the cochlea converts the mechanical vibration into nerve signal. Also, some of the areas in the left or right hemisphere of the brain like brocas area, vernic area and arcuate fasciculus could be able to detect this nerve signal to recognize speech. With reference to this complex biological process, one can make a statement that a scientist may have a limitation to make the model of the biological activities of speech in knowledge base. Intonation, one ofthe important suprasegmental features, is considered as one among the complex speech properties and its theoretical or scientific study got immense popularity in technological field. In order to model

2 intonation pattern for speech application, one should bear in mind that the intonation is also a complex and dynamic process like other speech properties. Intonation is defined as the speech melody occurred in the speech of a speaker due to the pitch fluctuation. Pitch fluctuation is closely related with fundamental frequency (to) variation. Because of this feature, intonation is also defined as the temporal changes of the fundamental frequency (fo). It occurs due to the frequency of vibration of the vocal codes. These vocal codes' vibration can be measured as fo contour in time domain. Some linguists view the fo contours or pitch levels as the vital information to determine the basic properties of intonation in speech. Pitch is perceived by the speaker and hearer in different levels to comprehend the message from the speech and the pitch is basically depending on the rate of vibration of vocal cords with in larynx. Pitch is a perceptual unit while fundamental frequency (fo) is an acoustic unit and in the case of pitch perception, one listener can judge whether the utterance is high or low and it is also possible to judge the voice quality of speech. It is stated that intonation is the fluctuation ofthe voice pitch as applied to the whole sentence. Languages generally use pitch variation to express the discourse meaning and emotional or attitudinal meaning. It is argued by some researchers that there is a clear cut distinction between prosody and intonation but other's view is that this distinction is minute in nature. Prosody refers to suprasegmental features (i.e. above the segmental level) or the rhythmic pattern of language and it also deals with the intonation aspect of language. In addition to the intonation aspect, prosodic analyses usually handle the temporal (duration) and air pressure (intensity) parameters. It is also true that the speech parameters like duration and intensity are considered as the essential acoustic features which will have a significant impact to produce the intonation of an utterance. Ladd (1996) defined that prosody is the suprasegmental features to convey sentence level pragmatic meaning. Sentences usually convey the lexical meaning, but when it combines with prosodic features, sentences will give the additional meaning to the utterance. Crystal (1975) says that prosodic features are the meaningful contrastive units in speech and it is occurred due to the presence of pitch, loudness and duration. In human communication, prosodic features playa key role in speech production and perception and these features make the system ofcommunication productive and accurate in nature. Since, speech is considered as the primary mode of human communication,

3 knowledge of prosodic features is essential to study the human communication in particular and speech in scientific way. Intonation study is mainly classified into two broad categories. One important classified study is the qualitative study or perceptual or psychological level of study and another one is the quantitative level of study. Perceptual analysis of intonation was the first approach in this research problem and later this study turned into the quantitative or acoustic level. The acoustic analysis of intonation became popular due to its scientific and quantitative nature and its scientific result of output. In order to study the basic and advanced features of human speech and human communication system, one should know prosodic features of speech especially intonation, thoroughly. It can be concluded in this section that since speech technology becomes an emerging technological field in this current era, intonation modeling and its scientific study get a significant research space in modern Natural Language Processing Technology. 1.1. AIM OF THE STUDY The aim of this study is to explore the area of Malayalam intonation with the help of speech software and to find out the intonation pattern of different sentence types in Malayalam language ie interrogative sentence, declarative sentence etc. 1.2. OBJECTIVE OF THE STUDY The objective of the present research is to develop the formal rules for Malayalam intonational phonology and phonetics. The phonetic features of Malayalam intonation are used to frame the rule of Malayalam intonational phonology. One important goal of this research is to do the statistical analysis of speech parameters like fundamental frequency (ill) and duration in syllable level. Shortly, this study will give a formal model of Malayalam intonation and this model and the result of this experiment can be useful for Text to speech system, Automatic speech recognition and other speech technology application. 1.3. HYPOTHESES OF THE STUDY The following hypotheses have been focused in this present study. In sentence level analysis of intonation, pitch level and pitch terminal commonly called pitch contour will form different intonation pattern for different types of

4 Malayalam sentences like Declarative sentence, Yes or No type interrogative sentence, Question word interrogative sentence, Imperative sentence, Debitive sentence l etc. But it will be possible to find out the uniform intonation pattern for all declarative sentences and this will be applicable to above mentioned sentence types also. Intonational phonology of Malayalam with reference to Malayalam syllable rules and structure, will frame the intonation patterns and this model is more appropriate to create the fundamental frequency (fd) contours. The analysis distinguishes the meaning bearing unit ie semantics of one type of sentence from other type by considering the different intonation pattern. In statistical analysis of speech parameters, there is more similarity occurred in fd values and pitch contour of different speakers. There will be similarity and dissimilarity in pitch variation and durational variation but the similarity is more compared with dissimilarity in speakers' speech. 1.4. SCOPE OF THE STUDY Even though intonation study has a significant role in the theoretical linguistics, this research will be useful to the field oftechnological disciplines like Computational linguistics, Forensic linguistics and Speech science. In Computational linguistics, the phonetic model of intonation in speech synthesis and speech recognition is one among the challenging tasks to overcome the robotic nature ofmachines to get the natural speech. The prosody implementation (mainly intonation and duration) in text to speech system has explored different research areas in intonation, recently. In order to get the intelligibility and naturalness of speech in speech synthesis system, some artificial intelligence approaches like neural networking can be used to model the intonation pattern. Although all these approaches have some limitations to give the naturalness to speech, some of the approaches have been succeeded to compute the perceptual quality of intonation in some extent. There is a new approach to incorporate the prosodic knowledge for feature extraction in speech recognition research. So, intonation study has got immense popularity in speech

5 recognition research, recently. Forensic Linguistics, an emerging research field is exploited the knowledge of intonation for speaker identification method. Intonation is one among the vital information to determine the intrinsic property of a speaker's speech. The study of intonation and its application in speech has been contributing in the research area of Speech and Hearing science. It is noted that most of speech technology applications like automatic speech recognition, speaker recognition and text to speech system are mostly designed with the knowledge of different statistical approach. So that the analytical and statistical study of intonation parameters especially fd and duration will be useful to model the speech application system under the design ofstatistical knowledge.moreover, intonation research is an ever-growing subject to explore the areas of speech in all directions and this research also tries to tackle the problems in speech technology. 1.5. METHODOLOGY The research methodology is primarily concerned with the collection and analysis of speech data for intonation study. Various sentence types like interrogative, declarative etc and emotive sentences are selected for recording. In order to achieve a high audio quality, these sentences have been recorded with 48 khz sample frequency and down sampled to 22 khz. Then the speech samples are quantized with 16 bits per sample. Seven female and seven male speakers are selected for recording different type of sentences and the paragraph of test battery with the subject of news is also included in the speech data. The proposed instrumental analysis is done in computer with the help of speech softwares. Different softwares are used for analysis as follows. Cool edit: a software tool is used to record the speech data and this tool is also being used for noise reduction and down sampling. One important point is in our mind before recording that the speech data should possess most of the prosodic information. The pitch information can be extracted from speech waves with the help ofspeech Analyzer software. The pitch information will determine the intonation contours to make the intonation pattern for various sentence types. Emotional sentences will also give the intonation contours of emotions but it is somehow difficult to map the intonation contours for different emotional states. Shortly, the analysis and experimentation will have a formal model of intonation as output to fit for various applications.

6 Speech Analyzer is a windows program speech tool which is designed to assist users in Speech analysis. Summer Institute of Linguistics (SIL), USA was developed the "Speech Analyzer" tool for computational analysis to extract the acoustic properties of speech sounds. In addition to various acoustic measurement of speech, Speech Analyzer is used to perform fundamental frequency, spectrographic and spectral analysis, and duration measurements. One important application of this software is in the field of annotation work like annotated speech corpus in corpora generation. However, Speech Ana(vzer is scientifically proved it's fitness to extract acoustic properties ofsounds in an accurate way. Cool edit developed by Syntrillium Software Corporation, Phoenix, AZ 85082-2255, USA is used to record the speech data for intonation study. It can also be used to examine the frequency components and other details like frequency Analysis, Statistics and Spectral view features. Acoustic analysis was also done with PRAAT speech software which was developed by Paul Boersma and David Weenink of the University of Amsterdam, Netherlands. The methodology adopted in this research is scientific in nature and this is the reason that the analysed Malayalam intonation patterns deals with a descriptive and formal rule of linguistic analysis even for prosody implementation. 1.6. ORGANISATION OF THE THESIS The thesis is organized into five chapters. The first chapter 'Introduction' discusses the key concept of speech science and intonation. This chapter also deals with aim, scope and methodology ofpresent study. The second chapter 'Speech science and Intonation' focuses on Acoustic phonetics, prosodic features of intonation and speech science and its application in technical field. This chapter also gives the broad idea of speech technology from the fundamental concept to advanced concept. Chapter third 'Review of Literature' deals with the review of previous work in intonation theory and intonation modeling. It is shown that the description of review of literature was scientific and critic in nature. The fourth chapter 'Malayalam Intonation Analysis' investigates the intonation patterns ofmalayalam with reference to different types of sentences. This chapter would be a broad description of Intonational phonology and phonetic features of intonation. The

7 statistical analysis of speech parameters and its observation is also a part of this fourth chapter. The fifth chapter 'Conclusion' gives a brief discussion of the subjects presented in this thesis. The research findings are discussed in a scientific manner. 1.7. LIMITATIONS OF THE STUDY The present study is mainly focused on Malayalam intonation and duration features of speech. As it is concerned with the prosody implementation of speech system, other prosodic features like tempo (speed rate of the speech), rhythm and voice quality should be considered and these are also to be incorporated into text to speech system or other speech application. The present study did not give much attention to the above said prosodic features. This limitation is occurred due to the space and time limitation ofthe thesis. Another limitation of this study is that it may not be able to deal the dialectal variation in speech. This intonation study is mainly concerned with the standard dialect ofmalayalam language only.

8 ENDNOTES Debitive sentence l Debitive is a sentence type to express the mood of a sentence. Malayalam sentences likes Qii pookaj;lam, Qii pookal}ta etc express the strong command.here al}am and anta express the mood or modality of the verb. The reference ofthis sentence type is available in R.E.Asher's book "Malayalam".