Analysis of Importance of the prosodic Features for Automatic Sentence Modality Recognition in French in real Conditions

Size: px
Start display at page:

Download "Analysis of Importance of the prosodic Features for Automatic Sentence Modality Recognition in French in real Conditions"

Transcription

1 Analysis of Importance of the prosodic Features for Automatic Sentence Modality Recognition in French in real Conditions PAVEL KRÁL 1, JANA KLEČKOVÁ 1, CHRISTOPHE CERISARA 2 1 Dept. Informatics & Computer Science University of West Bohemia, Plzeň, CZECH REPUBLIC 2 LORIA UMR 7503, Vandoeuvre-les-Nancy, FRANCE Abstract: - This paper deals with the measure of importance of the prosodic features for automatic sentence modality recognition in French in real conditions. We start by analysing the problem of subjectivity of manual labeling of corpus. Then, we show the results of automatic sentence modality recognition by only two prosodic features: fundamental frequency (F0) and energy. The global accuracy (ACC) is not sufficient for our application: animate a talking head [1] for deaf and hearing-impaired children by information about the sentence type. Next, we analyse the corpus for explaining these results. We consider, that prosodic features are sufficient only for prosodic question detection with accuracy greater than 80 %. For recognition of other modalities with accuracy over 80 %, we need other informations, as language model or semantic. Key-Words: - prosody, fundamental frequency (F0), energy, automatic sentences modality recognition (ASMR), modal corpus. 1 Introduction The main objective of this work is the analysis of the measure of basic prosodic features: fundamental frequency (F0) and energy according to results of automatic sentences modality recognition in French in real conditions. The real condition, means the spontaneous speech from French broadcast news evaluation. This study is performed in the area of developing application to help deaf and hearing-impaired children to better understand and to be integrated in classrooms with normal-hearing children. The classic automatic recognition systems transform speech signal into words sequency, which forms the sentence. It is not sufficient for conversation, because the information about sentence type is lost. Our study will complete this information in the system. 2 Short review of modality recognition approaches The basic rule concerning the prosody of French sentences modality is summarized as in [2]: Declarative sentence: small decrease of melody, Imperative sentence: melody, important decrease of Interrogative sentence: increase of melody, Grammar interrogative sentence: neutral intonation. Another variante of French sentences modalities [3] is a sentences distinction in only two classes: declarative or interrogative. Imperative sentence is appearing as a variant of declarative sentence. Very few papers have paid attention to sentence type recognition in French, but much more studies are about other languages and particularly about English. In the published works, the following features are used: F0 contour in [4] for German, F0 and energy in [5] for German and English, F0 and energy in [6] for Czech, F0 and duration of the ending suffix in [7] for standard Korean. Another work [8] investigates many other prosodic attributes that are mostly derived from F0, energy and duration, for example, the max, min, mean and standard deviation of F0, the energy mean and standard deviation and the number of frames in utterance and number of frames of F0. The features are computed

2 on the whole sentence and also on the last 200 ms of each sentence. The authors conclude that the end of sentences is the most important for modality recognition. In the literature, the following classification methods have been tested and compared for sentence type recognition: Neutral Network (NN) [4, 6, 9], Hidden Markov Models (HMMs) [8] and Classification and Regression Trees (CART) [8, 9]. The error rate is comparable between such classifiers. 3 French modal corpus We used ESTER corpus [10], which is used in the French broadcast news evaluation. This corpus has not been designed a priori to do sentence modality recognition, then we decided to re-label it. We use the three punctuation marks?.! to extract from the raw ESTER corpus a set of sentences that belongs to each category. This first modal corpus (hereafter called original corpus) contains sentences (15619 declarations, 1339 exclamations and 1541 questions) for training and 895 sentences (581 declarations, 170 exclamations and 144 questions) for testing. Thanks to this automatic extraction the context of the sentences is lost. 3.1 Manual re-labeling of the corpus Manually labeling sentence modality without context is very subjective. We discovered, that different labels are often given by different persons for the same sentence. In this part, we will study the measure of overlapping of sentence types as a function of the listeners number. For following analysis, we concentrate only to the training part of the corpus. We chose the following classes for manual re-labeling of the corpus according to our application: D: declaration, E: exclamation, G:grammar question (the listener is able to distinguish this sentence only by its grammatical structure) Q: prosodic question (the listener is able to distinguish this class mainly by intonation), X: the listener is not able to determine the type of this sentence, E: errors (in this sentences is some noices, music, two people are speaking, etc., so it is consider as an error). Two types of questions are chosen in relation to our prosodic module. We suppose, that the prosodic module will not be able to detect grammar question, because prosodic information may be here close to the affirmative sentences. We chose accidentally 400 questions from the original corpus and four listeners re-labeled them. The table 1 shows the different results of the labeling for the different labelers (two, three and four). We can observe, the labeling is realy very subjective. Some sentences, initially labeled as questions, are now labeled as declarations, errors or sentences, which type is very difficult to chose. The number of exclamations is very small and two labelers find none of them. In the table 1 are not shown all combinations of listeners, because the analysis is the same. We can summarize: if the number of listeners is increasing, the number of common sentences is decreasing. L D Do E G Go Q Qo X Xo Relation between two listenes L L Relation between three listeners L L L Relation between four listeners L L L L Table 1: Relation between the listeners number and the overlapping in labeling: the first part shows the relation between two listeners; the second, between three and the last one between four listeners; Xo is the number of common labeled sentences for the class X. In relation to our application, we analyse in details only the questions. The figure 1 shows the decrease of number of common labeled questions (grammar and

3 prosodic) in function of the number of listeners. After the first re-labeling, the number of questions is decreasing by about 50 % (more precisely down to 247 by listener L1 and down to 177 by listener L2). After a 2nd re-labeling the number of common question was reduced down to about 100 questions, chosen by both listeners. When the third listener heard the sentences, the common part for questions is reduced to only 70 sentences. After the last listener we had only 45 questions. This number represents only 11 % of the primary corpus. This problem can be explained by the following reasons: the context of the dialog is essential for the sentences modality recognition, some sentences can belong to several modalities, the listeners make errors when labeling. Figure 1: Number of common labeled questions (grammar and prosodic) in function of listeners number: G curve = grammar questions; Q curve = prosodic questions; G + Q curve = union of G and Q questions 4 Automatic sentence modality recognition In this section are described the steps, the needs for ASMR and our recognition accuracy. 4.1 Attributes choice We chose only the basic prosodic attributes: F0 and energy, because some [5, 6] other studies used them for automatic modality recognition with successfull results. On top of that, it is not very difficult to compute them. For F0 calculus is used the autocorelation function [11]. 4.2 Attributes extraction The second step is attributes extraction. Our approach is based on the following principe. We calculate F0 and energy values for each microsegment of the speech. F0 for unvoiced parts of the signal is completed by a linear interpolation. Then, each sentence is decomposed into 20 segments and for each segment the average value of F0 and energy is computed. We obtain 20 values of F0 and 20 values of energy. This features number is chosen experimentally [6]. 4.3 Classifier choice In the literature [9] it is shown that the classifier is not very important for our goal. For this reason and in order to simplify the work in this part of the research, we chose two basic classifiers NN and GMM. 4.4 Recognition accuracy The table 2 shows the accuracy of the ASMR in French for different features, different classifiers and their combination. The Q class is here only the prosodic questions. The grammar questions are excluded from this experiment, because we assume that they can not be detected by prosody. The expected accuracy is not very good in relationship with our application. We need an accuracy above 90 %. One possible reason is, that only the basic prosodic features are not enough discriminating for this task. We analyse our corpus for confirming or disapproving this hypothesis. ACC in [%] Feature Classifier D E Q total F0 GMM F0, E MLP F0, E GMM, MLP Table 2: Modal recognition ACC for different prosodic features and classifiers in % 5 Study of the French ESTER corpus The first study is the observation of the F0 slope at the end of the sentence. It is performed for testing the basic prosodic rules described in section. The end of sentence means the last segment of 0.7 s duration. The four values of F0 are computed for this segment by an autocorelation function. We use the linear regression

4 of these four values for analysing of the F0 slope. The table 3 shows the number of sentences according to following rules. The column with + symbol represents the sentences with positive F0 slope and - with negative F0 slope. This first analysis separates the linear regression values into only two intervals. The next analysis divides the linear regression values into three intervals. In the first one are all values of linear regression greater as 0.03 (marked as ++ in the table). It may be a characteristic for the questions. The second interval (marked as 0 ) is [0.03; 0.03]. The sentences with linear regression coefficients smaller that 0.03 (market as - - ) are in the last column of the table. It may be a characteristic of declarative sentences. We can conclude, that the majority (80 %) of prosodic questions respect the basic prosodic rule: the final F0 slope is inceasing. Only 59 % of declarations have a decreasing final F0 slope and approximately half of the grammar questions an increasing final F0 slope. This analysis confirms the good accuracy of question detection score by F0 features only. Conversely, the grammar questions, as mentioned in section, cannot be detected only by F0 features. The number of exclamations is not sufficient for performing this study, therefore it is not shown in the table. Class D G Q most discriminating. The ending F0 slope for prosodic question is clearly increasing and for the other two sentences types it is falling or neutral. These two types (D and G) are very close, which leads to some confusion in ASMR, if only F0 features are used. The behaviour of energy is difficult to explain from figure 3. We suppose, that energy is less discriminating, than F0, because the variance of energy is 10 x greater than the variance of F0 and the overlapping of the sentences in the different classes will be greater. For this hypothesis the histogram of F0 and especially of energy has been created. Figure 2: F0 curves for three types of sentences: D curve = declarations, G curve = grammar questions, Q curve = prosodic questions Table 3: Analysis of the slope of F0 curve at the end of sentences by linear regression in % The second analysis is the observation of F0 and energy curve, represented by all computed F0 and energy features. We compute the mean and variance values for all features. The means of F0 are shown in the figure 2 and the means of energy in the figure 3. The variance is not shown, because the figure would be difficult to be read. The variances for F0 are in interval (0; 0,02] and can be neglected. Conversely, the variance for energy is in interval [0,01; 0,2], which can be very important for ASMR. We can see, that the first two third of the F0 values for all classes are close, which is useless for modality recognition. The last third of the segment is the Figure 3: Energy curves for three types of sentences: D curve = declarations, G curve = grammar questions, Q curve = prosodic questions The figure 4 shows an important overlapping (F0 and energy too), particularly between the classes of grammar questions and declaration. This fact explains most of the confusions between these two classes.

5 Czech Republic under contract number 201/02/ References: [1] P. Kral and J. Kleckova, Speech recognition and animation of talking head, in IWSSIP 03, Prague, Czech Republic, September [2] H. Gezundhajt, La prosodie, in linguistes.com/phonetique/prosodie.html [3] P. Martin, L intonation en parole spontané, in Revue Franaise de Linguistique Appliqué, Paris, France, 2000, vol. IV-2, pp [4] R. Kompe, Prosody in Speech Understanding Systems, Springer, July [5] V. Strom, Detection of accents, phrase boundaries and sentence modality in german with prosodic features, in Eurospeech 95, Madrid, Figure 4: Histograms of F0 (left) and energy (right) slopes. From top to bottom: declarations, grammar questions and prosodic questions. 6 Conclusion Our analysis of the corpus shows that it is not possible to recognize all sentences types only with basic prosodic features (F0 and energy) in real conditions with a good accuracy. It is due to an important overlapping between the features values in the classes. The most discriminating is the class Q, where the accuracy is about 84 %. The recognition accuracy of the others classes (E and D) is about 50 %, which is not sufficient for our application. We will include other informations, such as language models and the semantic to improve the accuracy of ASMR. 7 Acknowledgements This work would not have been possible without the aid of Daniel Dechelot and Emanuel Didiot from French laboratory Loria, who is participed to the manual corpus re-labeling. The greatest thanks to Christophe Cerisara from the same laboratory for his remarks, help from his contribution and for the manual corpus re-labeling as well. The work presented in this paper was partly supported by the Grant Agency of [6] J. Kleckova and V. Matousek, Using prosodic characteristics in Czech dialog system, in Interact 97, [7] K. Chongdok and Y. Hiyon, Defining modality by terminal contours in standard korean, in 1st International Conference on Speech Sciences, Seoul, [8] H. Wright, M. Poesio, and S. Isard, Using high level dialogue information for dialogue act recognition using prosodic features, in ESCA Workshop on Prosody and Dialogue, Eindhoven, Holland, September [9] H. Wright, Automatic utterance type detection using suprasegmental features, in ICSLP 98, Sydney, 1998, p [10] [11] A. de Cheveigne and H. Kawahara, Comparative evaluation of F estimation algorithms, in Eurospeech 2001, Scandinavia, 2001.

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

Eyebrows in French talk-in-interaction

Eyebrows in French talk-in-interaction Eyebrows in French talk-in-interaction Aurélie Goujon 1, Roxane Bertrand 1, Marion Tellier 1 1 Aix Marseille Université, CNRS, LPL UMR 7309, 13100, Aix-en-Provence, France Goujon.aurelie@gmail.com Roxane.bertrand@lpl-aix.fr

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Probability and Statistics Curriculum Pacing Guide

Probability and Statistics Curriculum Pacing Guide Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods

More information

Dialog Act Classification Using N-Gram Algorithms

Dialog Act Classification Using N-Gram Algorithms Dialog Act Classification Using N-Gram Algorithms Max Louwerse and Scott Crossley Institute for Intelligent Systems University of Memphis {max, scrossley } @ mail.psyc.memphis.edu Abstract Speech act classification

More information

Mandarin Lexical Tone Recognition: The Gating Paradigm

Mandarin Lexical Tone Recognition: The Gating Paradigm Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

STA 225: Introductory Statistics (CT)

STA 225: Introductory Statistics (CT) Marshall University College of Science Mathematics Department STA 225: Introductory Statistics (CT) Course catalog description A critical thinking course in applied statistical reasoning covering basic

More information

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders

More information

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab Revisiting the role of prosody in early language acquisition Megha Sundara UCLA Phonetics Lab Outline Part I: Intonation has a role in language discrimination Part II: Do English-learning infants have

More information

IEEE Proof Print Version

IEEE Proof Print Version IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 1 Automatic Intonation Recognition for the Prosodic Assessment of Language-Impaired Children Fabien Ringeval, Julie Demouy, György Szaszák, Mohamed

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

CS 598 Natural Language Processing

CS 598 Natural Language Processing CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Individual Differences & Item Effects: How to test them, & how to test them well

Individual Differences & Item Effects: How to test them, & how to test them well Individual Differences & Item Effects: How to test them, & how to test them well Individual Differences & Item Effects Properties of subjects Cognitive abilities (WM task scores, inhibition) Gender Age

More information

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Navdeep Jaitly 1, Vincent Vanhoucke 2, Geoffrey Hinton 1,2 1 University of Toronto 2 Google Inc. ndjaitly@cs.toronto.edu,

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Statewide Framework Document for:

Statewide Framework Document for: Statewide Framework Document for: 270301 Standards may be added to this document prior to submission, but may not be removed from the framework to meet state credit equivalency requirements. Performance

More information

Word Stress and Intonation: Introduction

Word Stress and Intonation: Introduction Word Stress and Intonation: Introduction WORD STRESS One or more syllables of a polysyllabic word have greater prominence than the others. Such syllables are said to be accented or stressed. Word stress

More information

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,

More information

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion

More information

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Alistair Conkie AT&T abs - Research 180 Park Avenue, Florham Park,

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

CEFR Overall Illustrative English Proficiency Scales

CEFR Overall Illustrative English Proficiency Scales CEFR Overall Illustrative English Proficiency s CEFR CEFR OVERALL ORAL PRODUCTION Has a good command of idiomatic expressions and colloquialisms with awareness of connotative levels of meaning. Can convey

More information

A study of speaker adaptation for DNN-based speech synthesis

A study of speaker adaptation for DNN-based speech synthesis A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH Don McAllaster, Larry Gillick, Francesco Scattone, Mike Newman Dragon Systems, Inc. 320 Nevada Street Newton, MA 02160

More information

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh The Effect of Discourse Markers on the Speaking Production of EFL Students Iman Moradimanesh Abstract The research aimed at investigating the relationship between discourse markers (DMs) and a special

More information

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Ch 2 Test Remediation Work Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide an appropriate response. 1) High temperatures in a certain

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

Running head: DELAY AND PROSPECTIVE MEMORY 1

Running head: DELAY AND PROSPECTIVE MEMORY 1 Running head: DELAY AND PROSPECTIVE MEMORY 1 In Press at Memory & Cognition Effects of Delay of Prospective Memory Cues in an Ongoing Task on Prospective Memory Task Performance Dawn M. McBride, Jaclyn

More information

Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse

Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse Program Description Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse 180 ECTS credits Approval Approved by the Norwegian Agency for Quality Assurance in Education (NOKUT) on the 23rd April 2010 Approved

More information

Applications of memory-based natural language processing

Applications of memory-based natural language processing Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal

More information

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,

More information

Rhythm-typology revisited.

Rhythm-typology revisited. DFG Project BA 737/1: "Cross-language and individual differences in the production and perception of syllabic prominence. Rhythm-typology revisited." Rhythm-typology revisited. B. Andreeva & W. Barry Jacques

More information

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 - C.E.F.R. Oral Assessment Criteria Think A F R I C A - 1 - 1. The extracts in the left hand column are taken from the official descriptors of the CEFR levels. How would you grade them on a scale of low,

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

First Grade Curriculum Highlights: In alignment with the Common Core Standards

First Grade Curriculum Highlights: In alignment with the Common Core Standards First Grade Curriculum Highlights: In alignment with the Common Core Standards ENGLISH LANGUAGE ARTS Foundational Skills Print Concepts Demonstrate understanding of the organization and basic features

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading ELA/ELD Correlation Matrix for ELD Materials Grade 1 Reading The English Language Arts (ELA) required for the one hour of English-Language Development (ELD) Materials are listed in Appendix 9-A, Matrix

More information

Language Acquisition Chart

Language Acquisition Chart Language Acquisition Chart This chart was designed to help teachers better understand the process of second language acquisition. Please use this chart as a resource for learning more about the way people

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY

BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY Sergey Levine Principal Adviser: Vladlen Koltun Secondary Adviser:

More information

Grammar Lesson Plan: Yes/No Questions with No Overt Auxiliary Verbs

Grammar Lesson Plan: Yes/No Questions with No Overt Auxiliary Verbs Grammar Lesson Plan: Yes/No Questions with No Overt Auxiliary Verbs DIALOGUE: Hi Armando. Did you get a new job? No, not yet. Are you still looking? Yes, I am. Have you had any interviews? Yes. At the

More information

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers Chad Langley, Alon Lavie, Lori Levin, Dorcas Wallace, Donna Gates, and Kay Peterson Language Technologies Institute Carnegie

More information

Corpus Linguistics (L615)

Corpus Linguistics (L615) (L615) Basics of Markus Dickinson Department of, Indiana University Spring 2013 1 / 23 : the extent to which a sample includes the full range of variability in a population distinguishes corpora from archives

More information

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda Content Language Objectives (CLOs) Outcomes Identify the evolution of the CLO Identify the components of the CLO Understand how the CLO helps provide all students the opportunity to access the rigor of

More information

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION Han Shu, I. Lee Hetherington, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge,

More information

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012 Text-independent Mono and Cross-lingual Speaker Identification with the Constraint of Limited Data Nagaraja B G and H S Jayanna Department of Information Science and Engineering Siddaganga Institute of

More information

Segregation of Unvoiced Speech from Nonspeech Interference

Segregation of Unvoiced Speech from Nonspeech Interference Technical Report OSU-CISRC-8/7-TR63 Department of Computer Science and Engineering The Ohio State University Columbus, OH 4321-1277 FTP site: ftp.cse.ohio-state.edu Login: anonymous Directory: pub/tech-report/27

More information

Detecting English-French Cognates Using Orthographic Edit Distance

Detecting English-French Cognates Using Orthographic Edit Distance Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National

More information

SARDNET: A Self-Organizing Feature Map for Sequences

SARDNET: A Self-Organizing Feature Map for Sequences SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

More information

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Multi-modal Sensing and Analysis of Poster Conversations toward Smart Posterboard

Multi-modal Sensing and Analysis of Poster Conversations toward Smart Posterboard Multi-modal Sensing and Analysis of Poster Conversations toward Smart Posterboard Tatsuya Kawahara Kyoto University, Academic Center for Computing and Media Studies Sakyo-ku, Kyoto 606-8501, Japan http://www.ar.media.kyoto-u.ac.jp/crest/

More information

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty Julie Medero and Mari Ostendorf Electrical Engineering Department University of Washington Seattle, WA 98195 USA {jmedero,ostendor}@uw.edu

More information

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

More information

A Grammar for Battle Management Language

A Grammar for Battle Management Language Bastian Haarmann 1 Dr. Ulrich Schade 1 Dr. Michael R. Hieb 2 1 Fraunhofer Institute for Communication, Information Processing and Ergonomics 2 George Mason University bastian.haarmann@fkie.fraunhofer.de

More information

Some Principles of Automated Natural Language Information Extraction

Some Principles of Automated Natural Language Information Extraction Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract

More information

The influence of metrical constraints on direct imitation across French varieties

The influence of metrical constraints on direct imitation across French varieties The influence of metrical constraints on direct imitation across French varieties Mariapaola D Imperio 1,2, Caterina Petrone 1 & Charlotte Graux-Czachor 1 1 Aix-Marseille Université, CNRS, LPL UMR 7039,

More information

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA LANGUAGE AND SPEECH, 2009, 52 (4), 391 413 391 Variability in Word Duration as a Function of Probability, Speech Style, and Prosody Rachel E. Baker, Ann R. Bradlow Northwestern University, Evanston, IL,

More information

School of Innovative Technologies and Engineering

School of Innovative Technologies and Engineering School of Innovative Technologies and Engineering Department of Applied Mathematical Sciences Proficiency Course in MATLAB COURSE DOCUMENT VERSION 1.0 PCMv1.0 July 2012 University of Technology, Mauritius

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Physics 270: Experimental Physics

Physics 270: Experimental Physics 2017 edition Lab Manual Physics 270 3 Physics 270: Experimental Physics Lecture: Lab: Instructor: Office: Email: Tuesdays, 2 3:50 PM Thursdays, 2 4:50 PM Dr. Uttam Manna 313C Moulton Hall umanna@ilstu.edu

More information

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project Phonetic- and Speaker-Discriminant Features for Speaker Recognition by Lara Stoll Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California

More information

AP Statistics Summer Assignment 17-18

AP Statistics Summer Assignment 17-18 AP Statistics Summer Assignment 17-18 Welcome to AP Statistics. This course will be unlike any other math class you have ever taken before! Before taking this course you will need to be competent in basic

More information

Generative models and adversarial training

Generative models and adversarial training Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?

More information

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 Ranniery Maia 1,2, Jinfu Ni 1,2, Shinsuke Sakai 1,2, Tomoki Toda 1,3, Keiichi Tokuda 1,4 Tohru Shimizu 1,2, Satoshi Nakamura 1,2 1 National

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Non intrusive multi-biometrics on a mobile device: a comparison of fusion techniques

Non intrusive multi-biometrics on a mobile device: a comparison of fusion techniques Non intrusive multi-biometrics on a mobile device: a comparison of fusion techniques Lorene Allano 1*1, Andrew C. Morris 2, Harin Sellahewa 3, Sonia Garcia-Salicetti 1, Jacques Koreman 2, Sabah Jassim

More information

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One

More information

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com

More information

Interpreting ACER Test Results

Interpreting ACER Test Results Interpreting ACER Test Results This document briefly explains the different reports provided by the online ACER Progressive Achievement Tests (PAT). More detailed information can be found in the relevant

More information

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH Mietta Lennes Most of the phonetic knowledge that is currently available on spoken Finnish is based on clearly pronounced speech: either readaloud

More information

Journal of Phonetics

Journal of Phonetics Journal of Phonetics 41 (2013) 297 306 Contents lists available at SciVerse ScienceDirect Journal of Phonetics journal homepage: www.elsevier.com/locate/phonetics The role of intonation in language and

More information

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

LISTENING STRATEGIES AWARENESS: A DIARY STUDY IN A LISTENING COMPREHENSION CLASSROOM

LISTENING STRATEGIES AWARENESS: A DIARY STUDY IN A LISTENING COMPREHENSION CLASSROOM LISTENING STRATEGIES AWARENESS: A DIARY STUDY IN A LISTENING COMPREHENSION CLASSROOM Frances L. Sinanu Victoria Usadya Palupi Antonina Anggraini S. Gita Hastuti Faculty of Language and Literature Satya

More information

Organizing Comprehensive Literacy Assessment: How to Get Started

Organizing Comprehensive Literacy Assessment: How to Get Started Organizing Comprehensive Assessment: How to Get Started September 9 & 16, 2009 Questions to Consider How do you design individualized, comprehensive instruction? How can you determine where to begin instruction?

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

Automatic intonation assessment for computer aided language learning

Automatic intonation assessment for computer aided language learning Available online at www.sciencedirect.com Speech Communication 52 (2010) 254 267 www.elsevier.com/locate/specom Automatic intonation assessment for computer aided language learning Juan Pablo Arias a,

More information

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature 1 st Grade Curriculum Map Common Core Standards Language Arts 2013 2014 1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature Key Ideas and Details

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 8, NOVEMBER 2009 1567 Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

Approaches for analyzing tutor's role in a networked inquiry discourse

Approaches for analyzing tutor's role in a networked inquiry discourse Lakkala, M., Muukkonen, H., Ilomäki, L., Lallimo, J., Niemivirta, M. & Hakkarainen, K. (2001) Approaches for analysing tutor's role in a networked inquiry discourse. In P. Dillenbourg, A. Eurelings., &

More information

Lecture 2: Quantifiers and Approximation

Lecture 2: Quantifiers and Approximation Lecture 2: Quantifiers and Approximation Case study: Most vs More than half Jakub Szymanik Outline Number Sense Approximate Number Sense Approximating most Superlative Meaning of most What About Counting?

More information