CHINESE TIMT: A TIMIT-LIKE CORPUS OF STANDARD CHINESE
|
|
- Alisha Gregory
- 6 years ago
- Views:
Transcription
1 CHINESE TIMT: A TIMIT-LIKE CORPUS OF STANDARD CHINESE Jiahong Yuan 1, Hongwei Ding 2, Sishi Liao 2, Yuqing Zhan 2, and Mark Liberman 1 1 Linguistic Data Consortium, University of Pennsylvania 2 Institute of Cross-Linguistic Processing and Cognition, Shanghai Jiao Tong University ABSTRACT This paper describes an effort to build a TIMIT-like corpus in Standard Chinese, which is part of our Global TIMIT project. Three steps are involved and detailed in the paper: selection of sentences; speaker recruitment and recording; and phonetic segmentation. The corpus consists of 6000 sentences read by 50 speakers (25 females and 25 males). Phonetic segmentation obtained from forced alignment is provided, which has 93.2% agreement (of phone boundaries) within 20 ms compared to manual segmentation on 50 randomly selected sentences. Statistics on the number of tokens and mean duration of phones and tones in the corpus are also reported. Males have shorter phones/tones but more and longer utterance internal silences than females, demonstrating that males in this dataset speak faster but pause more frequently and longer. Index Terms TIMIT, Forced alignment, Maximum coverage, Standard Chinese 1. INTRODUCTION Since it was created three decades ago, the TIMIT speech corpus has been widely used in speech science and speech technology development [1-3]. The great success of TIMIT prompted the ongoing effort at the Linguistic Data Consortium to create Global TIMIT a series of TIMITlike corpora in a number of languages [4]. The original TIMIT dataset contains a total of 6300 sentence tokens, 10 sentences spoken by each of 630 speakers from eight major dialect regions of the United States. The sentence prompts include 2 dialect Shibboleth sentences (SA), 450 phonetically-compact sentences (SX), and 1890 phonetically-diverse sentences (SI). The dialect Shibboleth and phonetically-compact sentences were elaborately designed whereas the phonetically-diverse sentences were selected from existing text sources. The design of Global TIMIT adopts a scheme different from that of the original TIMIT. Instead of having 630 speakers and 10 sentences per speaker, the new design has 50 speakers and 120 sentences per speaker. This makes the corpus size comparable to the original TIMIT but requires much less time and effort for recruiting and recording. Among the 120 sentences read by a speaker, 20 are Calibration sentences, read by all speakers; 40 are Shared sentences, read by 10 speakers; and 60 are Unique sentences, read by only one speaker. The total number of sentence types is, therefore, *(50/10) + 60*50 = The design is summarized in Table 1. Table 1: The design of Global TIMIT. Sentence Type #Sentences #Speakers /Sentence Total #Sentences /Speaker Calibration Shared Unique Total The creation of a TIMIT-like corpus consists of three steps: design or selection of sentences; speaker recruitment and recording; and phonetic transcription and segmentation. This paper describes our effort to build Chinese TIMIT in these steps Candidate sentences 2. SENTENCE SELECTION All sentences were selected from the corpus of Chinese Gigaword Fifth Edition [5], which is a comprehensive archive of newswire text data from Chinese news sources candidate sentences were selected from the corpus by the following steps: 1. Extract sentences that are characters long, excluding those containing characters that are not on the list of the 3500 most frequently used Chinese characters ( 现代汉语常用字表 ); 2. Manually go through the list of extracted sentences in a random order, to remove those with uncommon words (e.g., person or place names) or inappropriate meaning (e.g., politically sensitive viewpoints), and also to segment the sentences into words. This was done until a pool of 5000 candidate sentences was generated, which contain approximately 6600 unique words and 2200 unique characters. Calibration, Shared, and Unique sentences were selected from the candidate pool using computer algorithms. A pronouncing dictionary was made for sentence selection and
2 phonetic segmentation. The dictionary and the sentence selection procedure are described in the following selections Pronouncing dictionary The pronouncing dictionary only transcribes the canonical pronunciation of a word as appeared in the dataset. Only a few words have more than one pronunciation, for which all pronunciations were listed. Hanyu Pinyin was used to transcribe the pronunciation, including initials, finals, and tone. A final in Mandarin Chinese may consist of one or more vowels (or vowels and glides, depending on the adopted phonological analysis), with or without a nasal coda. Because /o/ and /uo/ occur in complementary distribution and the acoustic difference between the two finals is negligible [6], they were treated as the same final. /i/ has three pronunciation variants, often transcribed as [ɿ] (when appearing after an alveolar fricative/affricate), [ʅ] (when appearing after a retroflex fricative/affricate), and [i] (in all other contexts). The three variants were treated as different finals, /i/ for [i], /ii/ for [ɿ], and /iii/ for [ʅ]. In total, there were 21 initials and 36 finals. Tones were marked on the finals, including Tone1 through Tone4, and Tone0 for the neutral tone. The phonetic labels are listed in Table 2. Table 2: Phonetic labels (in Pinyin). Initials b, p, m, f, d, t, n, l, g, k, h, j, q, x, zh, ch, sh, r, z, c, s Finals a, ai, an, ang, ao e, ei, en, eng, er i, ii, iii, ia, ian, iang, iao, ie, in, ing, iong, iu ong, ou u, ua, uai, uan, uang, ui, un, uo v, van, ve, vn * Tones 1, 2, 3, 4, 0 Silence sil * v represents ü in Pinyin, ii is for [ɿ], and iii is for [ʅ] Selecting sentences Twenty Calibration sentences were selected from the candidate pool to cover the maximum number of (toneindependent) syllable types in the language. This problem is known to be NP-Hard, but it can be approximately solved using greedy approximation [7]: Greedy Approximation: 1: cover ed s et i s empt y 2: Re pe at 3: Pi ck t he s ent ence wi t h t he maxi mum number of s yl l abl e t ype s not i n t he cover ed s et 4: Add s yl l abl e t ypes i n t he chos en s ent ence int o t he cover ed s et 5: Unt i l 20 s ent ences ar e s el ect ed As illustrated in Figure 1, we randomized the candidate sentences before the selection, and repeated the procedure 1000 times to obtain 1000 sets of 20 sentences. The set that contains the most number of tone-independent syllable types was used as Calibration sentences. Figure 1: Procedure for selecting Calibration sentences. Shared sentences were selected to cover the maximum number of tones and (within-word) tonal combinations. We need five sets of Shared sentences: each set has 40 sentences and will be read by 10 speakers. The first 20 sentences were selected to have at least five occurrences for each of the mono- and bi- tones. The second 20 sentences were selected to cover the maximum number of three- and four- tone combinations. The procedure was similar to that used for selecting Calibration sentences. Unique sentences were randomly selected from the remaining sentences in the candidate pool. 50 sets of 60 sentences were selected, each to be read by one speaker only. 3. SPEAKER RECRUITMENT AND RECORDING 50 college students at Shanghai Jiao Tong University, 25 females and 25 males, were recruited to read the sentences. All of them speak Standard Chinese. As a criterion to determine whether a subject speaks Standard Chinese, his/her spoken Mandarin proficiency assessed by Putonghua Shuiping Ceshi (which is the national standard Mandarin proficiency test) was used. There are seven levels of proficiency assessed by the test, which are, from highest to lowest: Class 1 Level 1, Class 1 Level 2, Class 2 Level 1, Class 2 Level 2, Class 3 Level 1, Class 3 Level 2, and Failed. In order to qualify for teaching K-12, one must pass Class 2 Level 2. The speakers recruited for the experiment all achieved Class 2 Level 1 or better on Putonghua Shuiping Ceshi. The recording was made in a sound-treated recording booth at Shanghai Jiao Tong University, using the SpeechRecorder Software [8]. The sentences were displayed on a computer screen for subjects to read, one at a time, controlled by the person who monitored the recording. A total of 6000 utterances were recorded, 120 utterances for each speaker.
3 4. PHONETIC SEGMENTATION 4.1. Forced Alignment HMM/GMM-based forced alignment was applied to obtain phonetic segmentation. In prior work [9,10], we demonstrated that employing explicit phone boundary models within the HMM framework could significantly improve forced alignment accuracy for both English and Mandarin Chinese. The phone boundary models were a special 1-state HMM (as shown in Figure 2), in which the state cannot repeat itself: Figure 2: Special 1-state HMM for phone boundaries with transition probabilities a 01 = a 12 = 1. Therefore, a boundary can have one and only one state occurrence, i.e., aligned with only one frame. The special 1- state phone boundary HMMs were combined with standard monophone HMMs. Given a phonetic transcription, phone boundaries were inserted between phones. For example, sil i g e sil becomes sil sil_i i i_g g g_e e e_sil sil. The boundary states were tied through decision-tree based clustering, similar to triphone state tying developed in speech recognition. We started with the acoustic models trained on Hub4 Mandarin Broadcast News Speech [11], and retrained the models by combining the Broadcast News Speech data and our recordings (Training on the combined data sets had better results than training on Chinese TIMIT data only). Toneindependent models were employed. The acoustic features were the standard 39 PLPs extracted with 25 ms Hamming window and 10 ms frame rate. Initials, monophthong finals (/a, e, i, ii, iii, u, v/), and silence were 3-state HMMs, all other finals (including diphthongs, triphthongs, and nasalcoda finals) were 5-state HMMs. Each state had 2 Gaussian mixture components with diagonal covariance matrices. The system was built using the HTK Toolkit [12] Evaluation of segmentation accuracy To evaluate segmentation accuracy, 50 randomly selected sentences were manually corrected by three of the authors. Excluding the boundaries between silence and a stop or an affricate, where the boundary cannot be determined because of the stop closure, there are 1431 boundaries in the 50 sentences. 93.2% of the boundaries (1333 boundaries) have an agreement of within 20 ms between forced alignment and manual segmentation, which is on par with state-of-the-art results in terms of accuracy of automatic phonetic segmentation. 5. STATISTICS OF THE CORPUS 5.1. Statistics of phones Based on the phonetic segmentation of the corpus, we calculated the total number of occurrences of every phone and its mean duration. The results are listed in Table 3, in which males and females are calculated separately. Table 3: Number of tokens and mean duration of phones in the corpus. Male Female Phone #tokens # duration # duration (all) (sec.) (sec.) /b/ /p/ /m/ /f/ /d/ /t/ /n/ /l/ /g/ /k/ /h/ /j/ /q/ /x/ /zh/ /ch/ /sh/ /r/ /z/ /c/ /s/ /a/ /e/ /i/ /ii/ /iii/ /u/ /v/ /ai/ /ao/
4 /ei/ /er/ /ia/ /iao/ /ie/ /iu/ /ou/ /ua/ /uai/ /ui/ /uo/ /ve/ /an/ /ang/ /en/ /eng/ /ian/ /iang/ /in/ /ing/ /iong/ /ong/ /uan/ /uang/ /un/ /van/ /vn/ Pause (all) Pause (Calibration) Interestingly, we can see from the table that males have a shorter duration across phones than females. Paired-saples t-test shows that the difference is statistically significant (p < 0.001). This result suggests that males speak faster than females. On the other hand, however, males made more pauses (976 vs. 754) and longer pauses ( sec. vs sec.) than females in the corpus (Utterance internal silences that are longer than 50 ms were counted as pauses). Because textual factors such as sentence length and syntactic complexity affect pause production, we also calculated pauses in the Calibration sentences only to remove the effects of those factors on the difference between males and females (they read the same sentences). The result is listed at the end of Table 3. For the Calibration sentences only, still, males made more pauses (179 vs. 147) and longer pauses ( sec. vs sec.) than females Statistics of tones The number of tokens and mean duration of tones (entire syllables) are listed in Table 4 and shown in Figure 3. We can see that Tone0 is the shortest; Tone1 and Tone2 are longer than Tone3 and Tone4. And again, males have a shorter duration on every tone than females. Table 4: Number of tokens and mean duration of tones in the corpus. Male Female Tone #tokens # duration # duration (all) (sec.) (sec.) T T T T T Figure 3: Mean duration of tones in the corpus. 6. CONCLUSION In this paper, we detailed the development of a TIMIT-like corpus in Standard Chinese. A simple analysis of the corpus shows that males speak faster but pause more frequently and longer than females. This result is consistent with our previous investigation of this topic based on telephone conversations and monologue speech [13, 14]. Along with Chinese TIMIT, we have also created an L2 English TIMIT, for which the same 50 speakers read easy sentences selected from the original TIMIT. We plan to extend the effort to L2 Chinese and L1 English, to make a basis for four-way comparison between L1 and L2 and between Chinese and English.
5 6. REFERENCES [1] V. Zue, Speech Database Development, Final Technical Report submitted to the Defense Advanced Research Projects Agency (for Contract # C-0341, June June 1987), [2] V. Zue, S. Seneff, and J. Glass, Speech database development at MIT: TIMIT and beyond, Speech Communication 9(4), pp , [3] Garofolo, J., et al., TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1), Linguistic Data Consortium, [4] N. Chanchaochai, J. Yuan, J. Wright, C. Cieri, and M. Liberman, Global TIMIT: Towards Creating TIMIT-analogous Speech Corpora, manuscript. [5] Parker, R., et al., Chinese Gigaword Fifth Edition (LDC2011T13), Linguistic Data Consortium, [6] J. Yuan, The spectral dynamics of vowels in Mandarin Chinese, Proceedings of Interspeech 2013, pp , [7] U. Feige, A Threshold of ln n for Approximating Set Cover, J. of the ACM 45(5), pp , [8] C. Draxler and K. Jänsch, SpeechRecorder - a Universal Platform Independent Multi-Channel Audio Recording Software, Proceedings of LREC, pp , [9] J. Yuan, N. Ryant, M. Liberman, A. Stolcke, V. Mitra, and W. Wang, Automatic phonetic segmentation using boundary models, Proceedings of Interspeech 2013, pp , [10] J. Yuan, N. Ryant, and M. Liberman, Automatic phonetic segmentation in Mandarin Chinese: Boundary models, glottal features and tone, Proceedings of ICASSP 2014, pp , [11] Huang, S., et al., 1997 Mandarin Broadcast News Speech (HUB4-NE) (LDC98S73), Linguistic Data Consortium, [12] Young, S., et al., The HTK Book, Web Download. [13] J. Yuan, M. Liberman, and C. Cieri, Towards an integrated understanding of speaking rate in conversation, Proceedings of Interspeech 2006, pp , [14] J. Yuan, X. Xu, W. Lai, and M. Liberman, Pauses and Pause Fillers in Mandarin Monologue Speech: The Effects of Sex and Proficiency, Proceedings of Speech Prosody 2016, pp , 2016.
Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition
Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Hua Zhang, Yun Tang, Wenju Liu and Bo Xu National Laboratory of Pattern Recognition Institute of Automation, Chinese
More informationInvestigation on Mandarin Broadcast News Speech Recognition
Investigation on Mandarin Broadcast News Speech Recognition Mei-Yuh Hwang 1, Xin Lei 1, Wen Wang 2, Takahiro Shinozaki 1 1 Univ. of Washington, Dept. of Electrical Engineering, Seattle, WA 98195 USA 2
More informationMandarin Lexical Tone Recognition: The Gating Paradigm
Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationOn the Formation of Phoneme Categories in DNN Acoustic Models
On the Formation of Phoneme Categories in DNN Acoustic Models Tasha Nagamine Department of Electrical Engineering, Columbia University T. Nagamine Motivation Large performance gap between humans and state-
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationThe NICT/ATR speech synthesis system for the Blizzard Challenge 2008
The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 Ranniery Maia 1,2, Jinfu Ni 1,2, Shinsuke Sakai 1,2, Tomoki Toda 1,3, Keiichi Tokuda 1,4 Tohru Shimizu 1,2, Satoshi Nakamura 1,2 1 National
More informationSpeech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines
Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationSemi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration
INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One
More informationSpeech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationBUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING
BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING Gábor Gosztolya 1, Tamás Grósz 1, László Tóth 1, David Imseng 2 1 MTA-SZTE Research Group on Artificial
More informationAcoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA
Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan James White & Marc Garellek UCLA 1 Introduction Goals: To determine the acoustic correlates of primary and secondary
More informationSTUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH
STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH Don McAllaster, Larry Gillick, Francesco Scattone, Mike Newman Dragon Systems, Inc. 320 Nevada Street Newton, MA 02160
More informationPossessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand
1 Introduction Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand heidi.quinn@canterbury.ac.nz NWAV 33, Ann Arbor 1 October 24 This paper looks at
More informationAutoregressive product of multi-frame predictions can improve the accuracy of hybrid models
Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Navdeep Jaitly 1, Vincent Vanhoucke 2, Geoffrey Hinton 1,2 1 University of Toronto 2 Google Inc. ndjaitly@cs.toronto.edu,
More informationNCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches
NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches Yu-Chun Wang Chun-Kai Wu Richard Tzong-Han Tsai Department of Computer Science
More informationAutomatic English-Chinese name transliteration for development of multilingual resources
Automatic English-Chinese name transliteration for development of multilingual resources Stephen Wan and Cornelia Maria Verspoor Microsoft Research Institute Macquarie University Sydney NSW 2109, Australia
More informationSpeech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers
Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers October 31, 2003 Amit Juneja Department of Electrical and Computer Engineering University of Maryland, College Park,
More informationA NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren
A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren Speech Technology and Research Laboratory, SRI International,
More informationWord Stress and Intonation: Introduction
Word Stress and Intonation: Introduction WORD STRESS One or more syllables of a polysyllabic word have greater prominence than the others. Such syllables are said to be accented or stressed. Word stress
More informationRole of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation
Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Alistair Conkie AT&T abs - Research 180 Park Avenue, Florham Park,
More informationEli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology
ISCA Archive SUBJECTIVE EVALUATION FOR HMM-BASED SPEECH-TO-LIP MOVEMENT SYNTHESIS Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano Graduate School of Information Science, Nara Institute of Science & Technology
More informationAtypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty
Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty Julie Medero and Mari Ostendorf Electrical Engineering Department University of Washington Seattle, WA 98195 USA {jmedero,ostendor}@uw.edu
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationEdinburgh Research Explorer
Edinburgh Research Explorer Personalising speech-to-speech translation Citation for published version: Dines, J, Liang, H, Saheer, L, Gibson, M, Byrne, W, Oura, K, Tokuda, K, Yamagishi, J, King, S, Wester,
More informationBooks Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny
By the End of Year 8 All Essential words lists 1-7 290 words Commonly Misspelt Words-55 working out more complex, irregular, and/or ambiguous words by using strategies such as inferring the unknown from
More informationThe Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access
The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access Joyce McDonough 1, Heike Lenhert-LeHouiller 1, Neil Bardhan 2 1 Linguistics
More informationA study of speaker adaptation for DNN-based speech synthesis
A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,
More informationSEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH
SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH Mietta Lennes Most of the phonetic knowledge that is currently available on spoken Finnish is based on clearly pronounced speech: either readaloud
More information1972 M.I.T. Linguistics M.S. 1972{1975 M.I.T. Linguistics Ph.D.
MARK LIBERMAN Education: 1965{1969 Harvard University Linguistics and Applied Mathematics 1972 M.I.T. Linguistics M.S. 1972{1975 M.I.T. Linguistics Ph.D. Professional Experience: Director, Linguistic Data
More informationADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION
ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION Mitchell McLaren 1, Yun Lei 1, Luciana Ferrer 2 1 Speech Technology and Research Laboratory, SRI International, California, USA 2 Departamento
More informationUniversal contrastive analysis as a learning principle in CAPT
Universal contrastive analysis as a learning principle in CAPT Jacques Koreman, Preben Wik, Olaf Husby, Egil Albertsen Department of Language and Communication Studies, NTNU, Trondheim, Norway jacques.koreman@ntnu.no,
More informationA New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation
A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick
More informationImproved Hindi Broadcast ASR by Adapting the Language Model and Pronunciation Model Using A Priori Syntactic and Morphophonemic Knowledge
Improved Hindi Broadcast ASR by Adapting the Language Model and Pronunciation Model Using A Priori Syntactic and Morphophonemic Knowledge Preethi Jyothi 1, Mark Hasegawa-Johnson 1,2 1 Beckman Institute,
More informationWhy Is the Chinese Curriculum Difficult for Immigrants Children from Southeast Asia
Why Is the Chinese Curriculum Difficult for Immigrants Children from Southeast Asia Chiu-Jung Chen 1,* 1 Department of E-Learning, Design and Management, National Chia-yi University, Taiwan *Correspondence:
More informationFlorida Reading Endorsement Alignment Matrix Competency 1
Florida Reading Endorsement Alignment Matrix Competency 1 Reading Endorsement Guiding Principle: Teachers will understand and teach reading as an ongoing strategic process resulting in students comprehending
More informationConsonants: articulation and transcription
Phonology 1: Handout January 20, 2005 Consonants: articulation and transcription 1 Orientation phonetics [G. Phonetik]: the study of the physical and physiological aspects of human sound production and
More informationBi-Annual Status Report For. Improved Monosyllabic Word Modeling on SWITCHBOARD
INSTITUTE FOR SIGNAL AND INFORMATION PROCESSING Bi-Annual Status Report For Improved Monosyllabic Word Modeling on SWITCHBOARD submitted by: J. Hamaker, N. Deshmukh, A. Ganapathiraju, and J. Picone Institute
More informationBuilding Text Corpus for Unit Selection Synthesis
INFORMATICA, 2014, Vol. 25, No. 4, 551 562 551 2014 Vilnius University DOI: http://dx.doi.org/10.15388/informatica.2014.29 Building Text Corpus for Unit Selection Synthesis Pijus KASPARAITIS, Tomas ANBINDERIS
More informationThe phonological grammar is probabilistic: New evidence pitting abstract representation against analogy
The phonological grammar is probabilistic: New evidence pitting abstract representation against analogy university October 9, 2015 1/34 Introduction Speakers extend probabilistic trends in their lexicons
More informationOnline Updating of Word Representations for Part-of-Speech Tagging
Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org
More informationThe influence of metrical constraints on direct imitation across French varieties
The influence of metrical constraints on direct imitation across French varieties Mariapaola D Imperio 1,2, Caterina Petrone 1 & Charlotte Graux-Czachor 1 1 Aix-Marseille Université, CNRS, LPL UMR 7039,
More informationJournal of Phonetics
Journal of Phonetics 40 (2012) 595 607 Contents lists available at SciVerse ScienceDirect Journal of Phonetics journal homepage: www.elsevier.com/locate/phonetics How linguistic and probabilistic properties
More informationPhonetics. The Sound of Language
Phonetics. The Sound of Language 1 The Description of Sounds Fromkin & Rodman: An Introduction to Language. Fort Worth etc., Harcourt Brace Jovanovich Read: Chapter 5, (p. 176ff.) (or the corresponding
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationInternational Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012
Text-independent Mono and Cross-lingual Speaker Identification with the Constraint of Limited Data Nagaraja B G and H S Jayanna Department of Information Science and Engineering Siddaganga Institute of
More informationEvaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment
Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment Akiko Sakamoto, Kazuhiko Abe, Kazuo Sumita and Satoshi Kamatani Knowledge Media Laboratory,
More informationPhonological and Phonetic Representations: The Case of Neutralization
Phonological and Phonetic Representations: The Case of Neutralization Allard Jongman University of Kansas 1. Introduction The present paper focuses on the phenomenon of phonological neutralization to consider
More informationClass-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification
Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,
More informationVoice conversion through vector quantization
J. Acoust. Soc. Jpn.(E)11, 2 (1990) Voice conversion through vector quantization Masanobu Abe, Satoshi Nakamura, Kiyohiro Shikano, and Hisao Kuwabara A TR Interpreting Telephony Research Laboratories,
More informationPerceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University
1 Perceived speech rate: the effects of articulation rate and speaking style in spontaneous speech Jacques Koreman Saarland University Institute of Phonetics P.O. Box 151150 D-66041 Saarbrücken Germany
More informationYear 4 National Curriculum requirements
Year National Curriculum requirements Pupils should be taught to develop a range of personal strategies for learning new and irregular words* develop a range of personal strategies for spelling at the
More informationLip reading: Japanese vowel recognition by tracking temporal changes of lip shape
Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,
More information**Note: this is slightly different from the original (mainly in format). I would be happy to send you a hard copy.**
**Note: this is slightly different from the original (mainly in format). I would be happy to send you a hard copy.** REANALYZING THE JAPANESE CODA NASAL IN OPTIMALITY THEORY 1 KATSURA AOYAMA University
More informationSpanish IV Textbook Correlation Matrices Level IV Standards of Learning Publisher: Pearson Prentice Hall
Person-to-Person Communication SIV.1 The student will exchange a wide variety of information orally and in writing in Spanish on various topics related to contemporary and historical events and issues.
More informationMULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY
MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract
More informationReview in ICAME Journal, Volume 38, 2014, DOI: /icame
Review in ICAME Journal, Volume 38, 2014, DOI: 10.2478/icame-2014-0012 Gaëtanelle Gilquin and Sylvie De Cock (eds.). Errors and disfluencies in spoken corpora. Amsterdam: John Benjamins. 2013. 172 pp.
More informationAnalysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion
More informationPobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016
LANGUAGE Maria Curie-Skłodowska University () in Lublin k.laidler.umcs@gmail.com Online Adaptation of Word-initial Ukrainian CC Consonant Clusters by Native Speakers of English Abstract. The phenomenon
More informationCombining a Chinese Thesaurus with a Chinese Dictionary
Combining a Chinese Thesaurus with a Chinese Dictionary Ji Donghong Kent Ridge Digital Labs 21 Heng Mui Keng Terrace Singapore, 119613 dhji @krdl.org.sg Gong Junping Department of Computer Science Ohio
More informationTRAINEESHIP TOOL MANUAL V2.1 VERSION April 1st 2017 * HOWEST.BE
WWW.HOWEST.BE/STAGE VERSION April 1st 2017 * STAGE@ HOWEST.BE TRAINEESHIP TOOL MANUAL V2.1 Guidelines for the use of the Howest traineeship tool elaborated for an external organisation: stage.howest.be
More informationWeb as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics
(L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes
More informationProgram Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading
Program Requirements Competency 1: Foundations of Instruction 60 In-service Hours Teachers will develop substantive understanding of six components of reading as a process: comprehension, oral language,
More informationVowel mispronunciation detection using DNN acoustic models with cross-lingual training
INTERSPEECH 2015 Vowel mispronunciation detection using DNN acoustic models with cross-lingual training Shrikant Joshi, Nachiket Deo, Preeti Rao Department of Electrical Engineering, Indian Institute of
More informationThe analysis starts with the phonetic vowel and consonant charts based on the dataset:
Ling 113 Homework 5: Hebrew Kelli Wiseth February 13, 2014 The analysis starts with the phonetic vowel and consonant charts based on the dataset: a) Given that the underlying representation for all verb
More informationA Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many
Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.
More informationSpeech Translation for Triage of Emergency Phonecalls in Minority Languages
Speech Translation for Triage of Emergency Phonecalls in Minority Languages Udhyakumar Nallasamy, Alan W Black, Tanja Schultz, Robert Frederking Language Technologies Institute Carnegie Mellon University
More informationBAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass
BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION Han Shu, I. Lee Hetherington, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge,
More informationKenya: Age distribution and school attendance of girls aged 9-13 years. UNESCO Institute for Statistics. 20 December 2012
1. Introduction Kenya: Age distribution and school attendance of girls aged 9-13 years UNESCO Institute for Statistics 2 December 212 This document provides an overview of the pattern of school attendance
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationPHONETIC DISTANCE BASED ACCENT CLASSIFIER TO IDENTIFY PRONUNCIATION VARIANTS AND OOV WORDS
PHONETIC DISTANCE BASED ACCENT CLASSIFIER TO IDENTIFY PRONUNCIATION VARIANTS AND OOV WORDS Akella Amarendra Babu 1 *, Ramadevi Yellasiri 2 and Akepogu Ananda Rao 3 1 JNIAS, JNT University Anantapur, Ananthapuramu,
More informationDOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds
DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS Elliot Singer and Douglas Reynolds Massachusetts Institute of Technology Lincoln Laboratory {es,dar}@ll.mit.edu ABSTRACT
More informationLetter-based speech synthesis
Letter-based speech synthesis Oliver Watts, Junichi Yamagishi, Simon King Centre for Speech Technology Research, University of Edinburgh, UK O.S.Watts@sms.ed.ac.uk jyamagis@inf.ed.ac.uk Simon.King@ed.ac.uk
More informationAutomatic intonation assessment for computer aided language learning
Available online at www.sciencedirect.com Speech Communication 52 (2010) 254 267 www.elsevier.com/locate/specom Automatic intonation assessment for computer aided language learning Juan Pablo Arias a,
More informationExperiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling
Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad
More informationA Neural Network GUI Tested on Text-To-Phoneme Mapping
A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis
More informationSwitchboard Language Model Improvement with Conversational Data from Gigaword
Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword
More informationSession 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design
Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design Paper #3 Five Q-to-survey approaches: did they work? Job van Exel
More informationA Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique
A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique Hiromi Ishizaki 1, Susan C. Herring 2, Yasuhiro Takishima 1 1 KDDI R&D Laboratories, Inc. 2 Indiana University
More informationUTD-CRSS Systems for 2012 NIST Speaker Recognition Evaluation
UTD-CRSS Systems for 2012 NIST Speaker Recognition Evaluation Taufiq Hasan Gang Liu Seyed Omid Sadjadi Navid Shokouhi The CRSS SRE Team John H.L. Hansen Keith W. Godin Abhinav Misra Ali Ziaei Hynek Bořil
More informationCharacterizing and Processing Robot-Directed Speech
Characterizing and Processing Robot-Directed Speech Paulina Varchavskaia, Paul Fitzpatrick, Cynthia Breazeal AI Lab, MIT, Cambridge, USA [paulina,paulfitz,cynthia]@ai.mit.edu Abstract. Speech directed
More informationConstructing Parallel Corpus from Movie Subtitles
Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing
More informationFirst Grade Curriculum Highlights: In alignment with the Common Core Standards
First Grade Curriculum Highlights: In alignment with the Common Core Standards ENGLISH LANGUAGE ARTS Foundational Skills Print Concepts Demonstrate understanding of the organization and basic features
More informationCorpus Linguistics (L615)
(L615) Basics of Markus Dickinson Department of, Indiana University Spring 2013 1 / 23 : the extent to which a sample includes the full range of variability in a population distinguishes corpora from archives
More informationRichardson, J., The Next Step in Guided Writing, Ohio Literacy Conference, 2010
1 Procedures and Expectations for Guided Writing Procedures Context: Students write a brief response to the story they read during guided reading. At emergent levels, use dictated sentences that include
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationDIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE
2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE Shaofei Xue 1
More informationDetecting English-French Cognates Using Orthographic Edit Distance
Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National
More informationArabic Orthography vs. Arabic OCR
Arabic Orthography vs. Arabic OCR Rich Heritage Challenging A Much Needed Technology Mohamed Attia Having consistently been spoken since more than 2000 years and on, Arabic is doubtlessly the oldest among
More informationThe Oregon Literacy Framework of September 2009 as it Applies to grades K-3
The Oregon Literacy Framework of September 2009 as it Applies to grades K-3 The State Board adopted the Oregon K-12 Literacy Framework (December 2009) as guidance for the State, districts, and schools
More information1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature
1 st Grade Curriculum Map Common Core Standards Language Arts 2013 2014 1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature Key Ideas and Details
More informationA Cross-language Corpus for Studying the Phonetics and Phonology of Prominence
A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence Bistra Andreeva 1, William Barry 1, Jacques Koreman 2 1 Saarland University Germany 2 Norwegian University of Science and
More informationIEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH 2009 423 Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition George
More informationSegregation of Unvoiced Speech from Nonspeech Interference
Technical Report OSU-CISRC-8/7-TR63 Department of Computer Science and Engineering The Ohio State University Columbus, OH 4321-1277 FTP site: ftp.cse.ohio-state.edu Login: anonymous Directory: pub/tech-report/27
More informationThe pronunciation of /7i/ by male and female speakers of avant-garde Dutch
The pronunciation of /7i/ by male and female speakers of avant-garde Dutch Vincent J. van Heuven, Loulou Edelman and Renée van Bezooijen Leiden University/ ULCL (van Heuven) / University of Nijmegen/ CLS
More informationPREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES
PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,
More informationCalibration of Confidence Measures in Speech Recognition
Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE
More informationLanguage Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin
Stromswold & Rifkin, Language Acquisition by MZ & DZ SLI Twins (SRCLD, 1996) 1 Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin Dept. of Psychology & Ctr. for
More information