CURRICULUM VITAE. Zhen-Hua Ling ( 凌震华 ) Research Interests. Education Experiences. Research Experiences. Research Projects

Size: px
Start display at page:

Download "CURRICULUM VITAE. Zhen-Hua Ling ( 凌震华 ) Research Interests. Education Experiences. Research Experiences. Research Projects"

Transcription

1 CURRICULUM VITAE Zhen-Hua Ling ( 凌震华 ) Associate Professor National Engineering Laboratory of Speech and Language Information Processing University of Science and Technology of China Hefei, Anhui, P.R. China, zhling@ustc.edu.cn Research Interests My research interests include speech signal processing, speech synthesis, voice conversion, speech analysis and speech coding. Now my research work focuses on statistical model based speech synthesis methods. Education Experiences 06/2002 B.E. degree in electronic information engineering, University of Science and Technology of China (USTC), Hefei, China 06/2005 M.S. degree in signal and information processing, University of Science and Technology of China (USTC), Hefei, China 06/2008 Ph.D. degree in signal and information processing, University of Science and Technology of China (USTC), Hefei, China Research Experiences 02/2011~present, associate professor at University of Science and Technology of China 08/2012~08/2013, visiting scholar at University of Washington, U.S.A. 07/2008~02/2011, postdoctoral researcher at University of Science and Technology of China 10/2007~04/2008, Marie Curie Fellow at the Centre for Speech Technology Research (CSTR), University of Edinburgh, U.K. 09/2002~06/2008, research assistant at iflytek Speech Lab, University of Science and Technology of China, Hefei, China Research Projects National Natural Science Foundation of China Hierarchical Speech Synthesis Method Combining Speech Production Mechanism and Statistical Acoustic Modeling (Grant No ), , Committee of NNSFC, PI; National Natural Science Foundation of China Royal Society of Edinburgh Joint Project Unified articulatory-acoustic modelling for flexible and controllable speech synthesis, (Grant No ), , Committee of NNSFC, PI; National Natural Science Foundation of China Statistical speech synthesis with articulatory modeling (Grant No ), , Committee of NNSFC, PI;

2 China Postdoctoral Science Foundation Speech synthesis based on automatic evaluation of synthetic performance (Grant No ), ,Committee of China Postdoctoral Science Foundation, PI; Sub-project of Hi-Tech Research and Development Program of China Key technique research and product development for multi-lingual speech synthesis (Grant No. 2006AA010104), , Ministry of Science and Technology of China; Hi-Tech Research and Development Program of China HMM-based expressive and multi-lingual speech synthesis (Grant No. 2006AA01Z137), , Ministry of Science and Technology of China; National Natural Science Foundation of China Expressive and Multi-Lingual Prosodic Modelling (Grant No ), , Committee of NNSFC. Awards Key techniques and applied development platform for intelligent speech interaction, the Second Prize of the National Science and Technology Progress Award (2011) IEEE Signal Processing Society Young Author Best Paper Award (2010) Key techniques and application platform for intelligent speech interaction, the First Prize of Science and Technology Progress of Anhui Province (2008) Chinese and English speech evaluation techniques for language learning, the Second Prize of Electronic and Information Science Progress, Chinese Institute of Electronics (2008) President Scholarship of Chinese Academy of Sciences (June 2007) Outstanding Master Graduate Student of University of Science and Technology of China (June 2005) Guanghua Scholarship of University of Science and Technology of China (June 2003) Academic Services Associate Editor of IEEE/ACM Transactions on Audio, Speech, and Language Processing, Jan ~ Jan ISCA (International Speech Communication Association) Communication Committee Member, 2014 Program Committee Member of 8th ISCA Speech Synthesis Workshop, 2013 Scientific Committee Member of 6th International Conference on Speech Prosody, 2012 Session chair of ICASSP(2014), Interspeech(2014, 2012) Reviewer for international journals and conferences IEEE Transactions on Audio Speech and Language Processing, Speech Communication, Computer Speech and Language, Information Sciences, Journal of Signal Processing Systems, ICASSP, Interspeech, ISCSLP, Speech Prosody, etc. Publications [International Journals] [1] Zhen-Hua Ling, Shi-Yin Kang, Heiga Zen, Andrew Senior, Mike Schuster, Xiao-Jun Qian, Helen Meng, Li Deng, "Deep Learning for Acoustic Modeling in Parametric Speech Generation," IEEE Signal Processing Magazine, accepted.

3 [2] Ling-Hui Chen, Zhen-Hua Ling, Li-Juan Liu, and Li-Rong Dai, "Voice Conversion Using Deep Neural Networks with Layer-Wise Generative Training," IEEE Transactions on Audio, Speech, and Language Processing, accepted. [3] Xian-Jun Xia, Zhen-Hua Ling, Yuan Jiang, and Li-Rong Dai, "HMM-based Unit Selection Speech Synthesis Using Log Likelihood Ratios Derived from Perceptual Data", Speech Communication, vol , pp , [4] Chen-Yu Yang, Zhen-Hua Ling, and Li-Rong Dai, "Unsupervised Prosodic Labeling of Speech Synthesis Databases Using Context-Dependent HMMs", IEICE Transactions on Information and Systems, vol.e97-d, no.6, pp , [5] Zhen-Hua Ling, Li Deng, and Dong Yu, "Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis," IEEE Transaction on Audio, Speech, and Language Processing, vol. 21, no. 10, pp , [6] Zhen-Hua Ling, Korin Richmond, and Junichi Yamagishi, Articulatory Control of HMM-based Parametric Speech Synthesis using Feature-Space-Switched Multiple Regression, IEEE Transaction on Audio, Speech, and Language Processing, vol. 21, no. 1, pp , [7] Zhen-Hua Ling, and Li-Rong Dai, Minimum Kullback-Leibler Divergence Parameter Generation for HMM-based Speech Synthesis, IEEE Transaction on Audio, Speech, and Language Processing, vol. 20, no. 5, pp , [8] Zhen-Hua Ling, Korin Richmond, and Junichi Yamagishi, An analysis of HMM-based prediction of articulatory movements, Speech Communication, vol. 52, no. 10, pp , [9] Heng Lu, Zhen-Hua Ling, Li-Rong Dai, and Ren-Hua Wang, Cross-Validation and Minimum Generation Error based Decision Tree Pruning for HMM-based Speech Synthesis, Computational Linguistics and Chinese Language Processing, vol. 15, no. 1, pp , March [10] Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi, and Ren-Hua Wang, Integrating articulatory features into HMM-based parametric speech synthesis, IEEE Transaction on Audio, Speech, and Language Processing, vol. 17, no. 6, pp , (IEEE Signal Processing Society 2010 Young Author Best Paper Award) [11] Junichi Yamagishi, Takashi Nose, Heiga Zen, Zhen-Hua Ling, Tomoki Toda, Keiichi Tokuda, Simon King, and Steve Renals, Robust speaker-adaptive HMM-based text-to-speech synthesis, IEEE Transaction on Andio, Speech and Language Processing, vol. 17, no. 6, pp , [Domestic Journals] [1] Ming-Qi Cai, Zhen-Hua Ling, and Li-Rong Dai, "Research on HMM-based Articulatory Movement Prediction for Chinese," Chinese Journal of Data Acquisition and Processing, vol. 29, no. 2, pp , (in Chinese) [2] Yang Song, Zhen-Hua Ling, and Li-Rong Dai, Optimization method for unit selection speech synthesis based on synthesis quality predictions, Journal of Tsinghua University (Sci & Tech), vol. 53, no. 6, pp , (in Chinese) [3] Ling-Hui Chen, Zhen-Hua Ling, and Li-Rong Dai, "Voice Conversion Based on Speaker Independent Model", Chinese Journal of Pattern Recognition and Artificial Intelligence, vol. 26, no. 3, pp (in Chinese)

4 [4] Yuan-Ping Zhang, Zhen-Hua Ling, Li-Rong Dai, and Qing-Feng Liu, "Improved decision tree based method for English prosodic phrase boundary prediction," Chinese Journal of Application Research of Computers, vol. 29, no. 8, pp , (in Chinese) [5] Hai-Bo Liu, Hui Li, and Zhen-Hua Ling, "The research on pitch extraction method for voice activity detection based on periodic decomposition, " Journal of University of Science and Technology of China, vol. 42, no. 2, pp , (in Chinese) [6] Yu Hu, Zhen-Hua Ling, Ren-Hua Wang, and Li-Rong Dai, "Acoustic Statistical Modeling Based Speech Synthesis Technologies," Journal of Chinese Information Processing, vol. 25, no.6, pp , (in Chinese) [7] Chen-Yu Yang, Li-Xin Zhu, Zhen-Hua Ling, and Li-Rong Dai, Automatic phrase boundary labeling for a Mandarin TTS corpus using the Viterbi decoding algorithm, Journal of Tsinghua University (Sci & Tech), vol. 51, no. 9, pp , (in Chinese) [8] Hang Liu, Zhen-Hua Ling, Wu Guo, and Li-Rong Dai, An improved cross-language model adaptation method for speech synthesis, Chinese Journal of Pattern Recognition and Artificial Intelligence, vol. 24, no. 4, 2011.(in Chinese) [9] Heng Lu, Zhen-Hua Ling, Ming Lei, Li-Rong Dai, and Ren-Hua Wang, Minimum generation error based optimization of HMM model clustering for speech synthesis, Chinese Journal of Pattern Recognition and Artificial Intelligence, vol. 23, no.6, pp , (in Chinese) [10] Huan-Huan Zhao, Zhen-Hua Ling, Ren-Hua Wang, and Li-Rong Dai, MAP-based speaker adaptation in speech synthesis, Chinese Journal of Data Acquisition and Processing, vol. 25, no.4, pp , (in Chinese) [11] Ming Lei, Zhen-Hua Ling, and Li-Rong Dai, Minimum generation error training based on perceptually weighted line spectral pair distance for statistical parametric speech synthesis, Chinese Journal of Pattern Recognition and Artificial Intelligence, vol. 23, no.4, pp , (in Chinese) [12] Ren-Hua Wang, Li-Rong Dai, Zhen-Hua Ling, and Yu Hu, Trainable unit selection speech synthesis under statistical framework, Chinese Science Bulletin, vol. 54, no. 11, pp , [13] Zhen-Hua Ling, and Ren-Hua Wang, Statistical acoustic model based unit selection algorithm for speech synthesis, Chinese Journal of Pattern Recognition and Artificial Intelligence, vol. 21, no. 3, pp , (in Chinese) [14] Wei Zhang, Zhen-Hua Ling, Guo-Ping Hu, and Ren-Hua Wang, A synthesis instance pruning approach based on virtual non-uniform replacements, Tsinghua Science and Technology, vol. 13, no. 4, pp , [15] Ren-Hua Wang, Li-Rong Dai, Yu Hu, and Zhen-Hua Ling, Acoustic statistical modeling based new generation speech synthesis technology, Journal of University of Science and Technology of China, vol. 38, no. 7, pp , (in Chinese) [16] Bin Zhou, Li-Rong Dai, Zhen-Hua Ling, and Ren-Hua Wang, Novel glottal analyzing algorithm for natural utterance, Chinese Journal of Data Acquisition and Processing, vol. 20, no. 3, pp , (in Chinese) [17] Zhen-Hua Ling, Zhi-Wei Shuang, Bin Zhou, Ren-Hua Wang, A wideband speech coding algorithm based on adaptive interpolation of weighted spectrum, Chinese Journal of Data Acquisition and Processing, vol. 20, no. 1, pp , (in Chinese)

5 [18] Dong-Lai Zhu, Ren-Hua Wang, Zhen-Hua Ling, and Wei Li, Putonghua prosodic word pitch model based on HMM, Chinese Journal of Acoustics, vol. 27, no. 6, pp , (in Chinese) [International Conferences] [1] Ling-Hui Chen, Zhen-Hua Ling, Yi-Qing Zu, Run-Qiang Yan, Yuan Jiang, Xian-Jun Xia, Ying Wang, "The USTC System for Blizzard Challenge 2014", in Blizzard Challenge Workshop, [2] Ming-Qi Cai, Zhen-Hua Ling, and Li-Rong Dai, "Formant-Controlled Speech Synthesis Using Hidden Trajectory Model", Interspeech, pp , [3] Xin Wang, Zhen-Hua Ling, and Li-Rong Dai, "Concept-to-Speech Generation by Integrating Syntagmatic Features into HMM-Based Speech Synthesis", Interspeech, pp , [4] Ling-Hui Chen, Zhen-Hua Ling, and Li-Rong Dai, "Voice Conversion Using Generative Trained Deep Neural Networks with Multiple Frame Spectral Envelopes", Interspeech, pp , [5] Xiang Yin, Ming Lei, Yao Qian, Frank K. Soong, Lei He, Zhen-Hua Ling, and Li-Rong Dai, "Modeling DCT Parameterized F0 Trajectory at Intonation Phrase Level with DNN or Decision Tree", Interspeech, pp , [6] Ling-Hui Chen, Tuomo Raitio, Cassia Valentini-Botinhao, Junichi Yamagishi, and Zhen-Hua Ling, "DNN-based stochastic postfilter for HMM-based speech synthesis", Interspeech, pp , [7] Li Gao, Zhen-Hua Ling, Ling-Hui Chen, Li-Rong Dai, "Improving F0 Prediction Using Bidirectional Associative Memories and Syllable-Level F0 Features for HMM-based Mandarin Speech Synthesis", in ISCSLP, [8] Yu-Sheng Sun, Zhen-Hua Ling, Xiang Yin, Li-Rong Dai, "Integrating Global Variance of Log Power Spectrum Derived from LSPs into MGE Training for HMM-Based Parametric Speech Synthesis", in ISCSLP, [9] Xiang Yin, Zhen-Hua Ling, and Li-Rong Dai, "SPECTRAL MODELING USING NEURAL AUTOREGRESSIVE DISTRIBUTION ESTIMATORS FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS", in ICASSP, pp , [10] Li-Juan Liu, Ling-Hui Chen, Zhen-Hua Ling, and Li-Rong Dai, "USING BIDIRECTIONAL ASSOCIATIVE MEMORIES FOR JOINT SPECTRAL ENVELOPE MODELING IN VOICE CONVERSION", in ICASSP, pp , [11] Ling-Hui Chen, Zhen-Hua Ling, Yuan Jiang, Yang Song, Xian-Jun Xia, Yi-Qing Zu, Run-Qiang Yan, and Li-Rong Dai, "The USTC System for Blizzard Challenge 2013", in Blizzard Challenge Workshop, [12] Maria Astrinaki, Alexis Moinet, Junichi Yamagishi, Korin Richmond, Zhen-Hua Ling, Simon King, and Thierry Dutoit, "Mage - Reactive articulatory feature control of HMM-based parametric speech synthesis", in 8th ISCA Speech Synthesis Workshop, pp , [13] Ling-Hui Chen, Zhen-Hua Ling, Yan Song, and Li-Rong Dai, Joint spectral distribution modeling using restricted Boltzmann machines for voice conversion, in Interspeech, pp , 2013.

6 [14] Korin Richmond, Zhen-Hua Ling, Junichi Yamagishi, Benigno Uria, On the Evaluation of Inversion Mapping Performance in the Acoustic Domain, in Interspeech, pp , [15] Zhen-Hua Ling, Li Deng, and Dong Yu, "Modeling Spectral Envelopes Using Restricted Boltzmann Machines for Statistical Parametric Speech Synthesis", in ICASSP, pp , [16] Chen-Yu Yang, Zhen-Hua Ling, and Li-Rong Dai, "Unsupervised Prosodic Phrase Boundary Labeling of Mandarin Speech Synthesis Database Using Context-Dependent HMM", in ICASSP, pp , [17] Xin Wang, Zhen-Hua Ling, and Li-Rong Dai, "Cross-Stream Dependency Modeling Using Continuous F0 Model For Hmm-Based Speech Synthesis", In Proc. of ISCSLP, [18] Xian-Jun Xia,Zhen-Hua Ling, Chen-Yu Yang, Li-Rong Dai, "Improved Unit Selection Speech Synthesis Method Utilizing Subjective Evaluation Results On Synthetic Speech", in Proc. of ISCSLP, [19] Ming-Qi Cai, Zhen-Hua Ling, and Li-Rong Dai, Target-filtering model based articulatory movement prediction for articulatory control of HMM-based speech synthesis, in Proc. of the 11th International Conference on Signal Processing, [20] Zhen-Hua Ling, Xian-Jun Xia, Yang Song, Chen-Yu Yang, Ling-Hui Chen, and Li-Rong Dai "The USTC System for Blizzard Challenge 2012", in Proc. of Blizzard Challenge workshop, [21] Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi, Vowel Creation by Articulatory Control in HMM-based Parametric Speech Synthesis, Interspeech [22] Xiang Yin, Zhen-Hua Ling, Ming Lei, Li-Rong Dai, Considering Global Variance of the Log Power Spectrum Derived from Mel-Cepstrum in HMM-based Parametric Speech Synthesis, Interspeech [23] Ling-Hui Chen, Chen-Yu Yang, Zhen-Hua Ling, Yuan Jiang, Li-Rong Dai, Yu Hu, and Ren-Hua Wang, The USTC system for Blizzard Challenge 2011, in Proc. of Blizzard Challenge workshop, [24] Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi, Feature-Space Transform Tying in Unified Acoustic-Articulatory Modelling for Articulatory Control of HMM-based Speech Synthesis, in Proc. of Interspeech, pp , [25] Ling-Hui Chen, Yoshihiko Nankaku, Heiga Zen, Keiichi Tokuda, Zhen-Hua Ling, Li-Rong Dai, Estimation of Window Coefficients for Dynamic Feature Extraction for HMM based Speech Synthesis, in Proc. of Interspeech, pp , [26] Ming Lei, Junichi Yamagishi, Korin Richmond, Zhen-Hua Ling, Simon King, Li-Rong Dai, Formant-controlled HMM-based Speech Synthesis, in Proc. of Interspeech, pp , [27] Heng Lu, Zhen-Hua Ling, Li-Rong Dai, Ren-Hua Wang, Building HMM based Unit-Selection Speech Synthesis System Using Synthetic Speech Naturalness Evaluation Score, in Proc. of ICASSP, pp , [28] Ming Lei, Zhen-Hua Ling, Li-Rong Dai, Preserve Ordering Property of Generated LSPs for Minimum Generation Error Training in HMM-based Speech Synthesis, in Proc. of ICASSP, pp , [29] Ling-Hui Chen, Zhen-Hua Ling, Li-Rong Dai, Non-Parallel Training for Voice Conversion based on FT-GMM, in Proc. of ICASSP, pp , 2011.

7 [30] Zhen-Hua Ling, Zhi-Guo Wang, Li-Rong Dai, Statistical Modeling of Syllable-Level F0 Features for HMM-based Unit Selection Speech Synthesis, in Proc. of ISCSLP, pp , [31] Ling-Hui Chen, Zhen-Hua Ling, Wu Guo, Li-Rong Dai, GMM-based Voice Conversion with Explicit Modelling on Feature Transform, in Proc. of ISCSLP, pp , [32] Chen-Yu Yang, Zhen-Hua Ling, Heng Lu, Wu Guo, Li-Rong Dai, Automatic Phrase Boundary Labeling for Mandarin TTS Corpus Using Context-Dependent HMM, in Proc. of ISCSLP 2010, pp [33] Tian-Yi Zhao, Zhen-Hua Ling, Ming Lei, Li-Rong Dai, Qing-Feng Liu, Minimum Generation Error Training for HMM-based Prediction of Articulatory Movements, in Proc. of ISCSLP, pp , [34] Ming Lei, Yi-Jian Wu, Zhen-Hua Ling, Li-Rong Dai, Investigation of Prosodic F0 Layers in Hierarchical F0 Modeling for HMM-based Speech Synthesis, in Proc. of International Conference on Signal Processing, pp , [35] Zhen-Hua Ling, Yu Hu, and Li-Rong Dai, Global Variance Modeling on the Log Power Spectrum of LSPs for HMM-based Speech Synthesis, in Proc. of Interspeech, pp , [36] Zhen-Hua Ling, Korin Richmond, and Junichi Yamagishi, HMM-based Text-to-Articulatoryovement Prediction and Analysis of Critical Articulators, in Proc. of Interspeech, pp , [37] Heng Lu, Zhen-Hua Ling, Si Wei, Li-Rong Dai, and Ren-Hua Wang, Automatic Error Detection for Unit Selection Speech Synthesis Using Log Likelihood Ratio based SVM Classifier, in Proc. of Interspeech, pp , [38] Ming Lei, Yi-Jian Wu, Frank K. Soong, Zhen-Hua Ling, and Li-Rong Dai, A Hierarchical F0 Modeling Method for HMM-based Speech Synthesis, in Proc. of Interspeech, pp , [39] Yuan Jiang, Zhen-Hua Ling, Ming Lei, Cheng-Cheng Wang, Heng Lu, Yu Hu, Li-Rong Dai, Ren-Hua Wang, The USTC system for Blizzard Challenge 2010, in Proc. of Blizzard Challenge workshop, [40] Ming Lei, Zhen-Hua Ling, and Li-Rong Dai, Minimum generation error training with weighted Euclidean distance on LSP for HMM-based speech synthesis, in Proc. of ICASSP, pp , [41] Cheng-Cheng Wang, Zhen-Hua Ling, and Li-Rong Dai, Asynchronous F0 and spectrum modeling for HMM-based speech synthesis, in Proc. of Interspeech, pp , [42] Heng Lu, Zhen-Hua Ling, Ming Lei, Cheng-Cheng Wang, Huan-Huan Zhao, Ling-Hui Chen, Yu Hu, Li-Rong Dai, and Ren-Hua Wang, The USTC system for Blizzard Challenge 2009, in Proc. of Blizzard Challenge workshop, [43] Long Qin, Yi-Jian Wu, Zhen-Hua Ling, and Ren-Hua Wang, Model adaptation for HMM-based speech synthesis under minimum generation error criterion, in Proc. of IEEE International Symposium on Multimedia, pp , [44] Zhen-Hua Ling, Wei Zhang, and Ren-Hua Wang, Cross-stream dependency modeling for HMM-based speech synthesis, in Proc. of ISCSLP, pp. 5-8, [45] Chen-Cheng Wang, Zhen-Hua Ling, Bu-Fan Zhang, and Li-Rong Dai, Multi-layer F0 modeling for HMM-based speech synthesis, in Proc. of ISCSLP, pp , 2008.

8 [46] Heng Lu, Zhen-Hua Ling, Si Wei, Yu Hu, Li-Rong Dai, and Ren-Hua Wang, Heteronym verification for Mandarin speech synthesis, in Proc. of ISCSLP, pp , [47] Wei Zhang, Zhen-Hua Ling, and Li-Rong Dai, Constructing scalable TTS system based on corpus approach, in Proc. of IEEE International Conference on Cybernetics and Intelligent Systems, pp , [48] Zhen-Hua Ling, Heng Lu, Guo-Ping Hu, Li-Rong Dai, and Ren-Hua Wang, The USTC entry for Blizzard Challenge 2008, in Proc. of Blizzard Challenge workshop, [49] Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi, and Ren-Hua Wang, Articulatory control of HMM-based parametric speech synthesis driven by phonetic knowledge, in Proc. of Interspeech, pp , [50] Junichi Yamagishi, Zhen-Hua Ling, and Simon King, Robustness of HMM-based speech synthesis, in Proc. of Interspeech, pp , [51] Zhen-Hua Ling, and Ren-Hua Wang, Minimum unit selection error training for HMM-based unit selection speech synthesis system, in Proc. of ICASSP, pp , [52] Long Qin, Yi-Jian Wu, Zhen-Hua Ling, Ren-Hua Wang, and Li-Rong Dai, Minimum generation error linear regression based model adaptation for HMM-based speech synthesis, in Proc. of ICASSP, pp , [53] Zhen-Hua Ling, Long Qin, Heng Lu, Yu Gao, Li-Rong Dai, Ren-Hua Wang, Yuan Jiang, Zhi-Wei Zhao, Jin-Hui Yang, Jie Chen, and Guo-Ping Hu, The USTC and iflytek speech synthesis systems for Blizzard Challenge 2007, in Proc. of Blizzard Challenge workshop, [54] Zhen-Hua Ling, and Ren-Hua Wang, HMM-based unit selection combining Kullback-Leibler divergence with likelihood criterion, in Proc. of ICASSP, pp , [55] Long Qin, Zhen-Hua Ling, Yi-Jian Wu, Bu-Fan Zhang, and Ren-Hua Wang, HMM-based emotional speech synthesis using average emotion model, in Proc. of International Symposium on Chinese Spoken Language Processing (ISCSLP), pp , [56] Bu-Fan Zhang, Zhen-Hua Ling, Long Qin, and Renhua Wang, Applying SFC model for Chinese expressive speech synthesis, in Proc. of ISCSLP, [57] Zhen-Hua Ling, Yi-Jian Wu, Yu-Ping Wang, Long Qin, and Ren-Hua Wang, USTC system for Blizzard Challenge an improved HMM-based speech synthesis method, in Proc. of Blizzard Challenge workshop, [58] Zhen-Hua Ling, and Ren-Hua Wang, HMM-based unit selection using frame sized speech segments, in Proc. of Interspeech, pp , [59] Long Qin, Yi-Jian Wu, Zhen-Hua Ling, and Ren-Hua Wang, Improving the performance of HMM-based voice conversion using context clustering decision tree and appropriate regression matrix format, in Proc. of Interspeech, pp , [60] Zhen-Hua Ling, Yu Hu, and Ren-Hua Wang, A novel source analysis method by matching spectral characters of LF model with STRAIGHT spectrum, in Proc. of the First International Conference on Affective Computing & Intelligent Interaction (ACII), Lecture Notes in Computer Science, vol. 3784, pp , [61] Yu-Ping Wang, Zhen-Hua Ling, and Ren-Hua Wang, Emotional speech synthesis based on improved codebook mapping voice conversion, in Proc. of the ACII, Lecture Notes in Computer Science, vol. 3784, pp , 2005.

9 [62] Long Qin, Gao-Peng Chen, Zhen-Hua Ling, and Li-Rong Dai, An improved spectral and prosodic transformation method in STRAIGHT-based voice conversion, in Proc. of ICASSP, vol. 1, pp , [63] Zhen-Hua Ling, Yu-Ping Wang, Yu Hu, and Ren-Hua Wang, Modeling glottal effect on the spectral envelop of STRAIGHT using mixture of Gaussians, in Proc. of ISCSLP, pp , [64] Zhen-Hua Ling, Yu Hu, Zhi-Wei Shuang, and Ren-Hua Wang, Compression of speech database by feature separation and pattern clustering using STRAIGHT, in Proc. of Interspeech, pp , [65] Zhi-Wei Shuang, Zi-Xiang Wang, Zhen-Hua Ling, and Renhua Wang, A novel voice conversion system based on codebook mapping with phoneme-tied weighting, in Proc. of Interspeech, pp , [66] Zhen-Hua Ling, Yu Hu, Zhi-Wei Shuang, and Ren-Hua Wang, Decision tree based unit pre-selection in Mandarin Chinese synthesis, in Proc. of ISCSLP, [67] Zhi-Wei Shuang, Yu Hu, Zhen-Hua Ling, and Ren-Hua Wang, A miniature Chinese TTS system based on tailored corpus, in Proc. of ICSLP, pp , [Domestic Conferences] [1] Jiang Yuan, Shuang-Hua Zhu, Zhen-Hua Ling and Li-Rong Dai, Research on Improving Methods for HMM Based Unit Selection Speech Synthesis, in Proc. of the 11th National Conference on Man-Machine Speech Communication, (in Chinese) [2] Zhen-Hua Ling, Yu Hu, An experimental study on the similarity performance of HMM-based parametric speech synthesis, in Proc. of the 10th National Conference on Man-Machine Speech Communication, [3] Huan-Huan Zhao, Zhen-Hua Ling, Long Qin, Ren-Hua Wang, and Li-Rong Dai, Eigenvoice based model adaptation method for voice conversion in speech synthesis, in Proc. of the 9th National Conference on Man-Machine Speech Communication, (in Chinese) [4] Yu Gao, Zhen-Hua Ling, Li-Rong Dai, and Ren-Hua Wang, An improved sinusoidal speech analysis-synthesis method, in Proc. of the 9th National Conference on Man-Machine Speech Communication, (in Chinese) [5] Wei Zhang, Zhen-Hua Ling, Guo-Ping Hu, and Ren-Hua Wang, Synthesis instances pruning approach based on virtual non-uniform replacing, in Proc. of the 9th National Conference on Man-Machine Speech Communication, [6] Heng Lu, Wei Zhang, Zhen-Hua Ling, Ren-Hua Wang, and Li-Rong Dai, An HMM-based speech synthesis system using multi-gaussian modeling and selection, in Proc. of the 9th National Conference on Man-Machine Speech Communication, (in Chinese) [7] Bin Zhou, Zhen-Hua Ling, Zhi-Wei Shuang, and Ren-Hua Wang, Research on speech synthesizer based on inverse filtering and LF model, in Proc. of the 7th National Conference on Man-Machine Speech Communication, (in Chinese) [8] Ren-Hua Wang, Yu Hu, Wei Li, and Zhen-Hua Ling, Corpus based Chinese speech synthesis system using decision tree, in Proc. of the 6th National Conference on Man-Machine Speech Communication, (in Chinese)

A study of speaker adaptation for DNN-based speech synthesis

A study of speaker adaptation for DNN-based speech synthesis A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 Ranniery Maia 1,2, Jinfu Ni 1,2, Shinsuke Sakai 1,2, Tomoki Toda 1,3, Keiichi Tokuda 1,4 Tohru Shimizu 1,2, Satoshi Nakamura 1,2 1 National

More information

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS Heiga Zen, Haşim Sak Google fheigazen,hasimg@google.com ABSTRACT Long short-term

More information

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence INTERSPEECH September,, San Francisco, USA Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence Bidisha Sharma and S. R. Mahadeva Prasanna Department of Electronics

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION Mitchell McLaren 1, Yun Lei 1, Luciana Ferrer 2 1 Speech Technology and Research Laboratory, SRI International, California, USA 2 Departamento

More information

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion

More information

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction INTERSPEECH 2015 Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction Akihiro Abe, Kazumasa Yamamoto, Seiichi Nakagawa Department of Computer

More information

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Edinburgh Research Explorer

Edinburgh Research Explorer Edinburgh Research Explorer Personalising speech-to-speech translation Citation for published version: Dines, J, Liang, H, Saheer, L, Gibson, M, Byrne, W, Oura, K, Tokuda, K, Yamagishi, J, King, S, Wester,

More information

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE Shaofei Xue 1

More information

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Hua Zhang, Yun Tang, Wenju Liu and Bo Xu National Laboratory of Pattern Recognition Institute of Automation, Chinese

More information

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders

More information

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012 Text-independent Mono and Cross-lingual Speaker Identification with the Constraint of Limited Data Nagaraja B G and H S Jayanna Department of Information Science and Engineering Siddaganga Institute of

More information

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Navdeep Jaitly 1, Vincent Vanhoucke 2, Geoffrey Hinton 1,2 1 University of Toronto 2 Google Inc. ndjaitly@cs.toronto.edu,

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING Gábor Gosztolya 1, Tamás Grósz 1, László Tóth 1, David Imseng 2 1 MTA-SZTE Research Group on Artificial

More information

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project Phonetic- and Speaker-Discriminant Features for Speaker Recognition by Lara Stoll Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California

More information

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren Speech Technology and Research Laboratory, SRI International,

More information

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,

More information

Letter-based speech synthesis

Letter-based speech synthesis Letter-based speech synthesis Oliver Watts, Junichi Yamagishi, Simon King Centre for Speech Technology Research, University of Edinburgh, UK O.S.Watts@sms.ed.ac.uk jyamagis@inf.ed.ac.uk Simon.King@ed.ac.uk

More information

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important

More information

Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing

Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing Pallavi Baljekar, Sunayana Sitaram, Prasanna Kumar Muthukumar, and Alan W Black Carnegie Mellon University,

More information

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology ISCA Archive SUBJECTIVE EVALUATION FOR HMM-BASED SPEECH-TO-LIP MOVEMENT SYNTHESIS Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano Graduate School of Information Science, Nara Institute of Science & Technology

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information

Statistical Parametric Speech Synthesis

Statistical Parametric Speech Synthesis Statistical Parametric Speech Synthesis Heiga Zen a,b,, Keiichi Tokuda a, Alan W. Black c a Department of Computer Science and Engineering, Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya,

More information

Eileen Bau CIE/USA-DFW 2014

Eileen Bau CIE/USA-DFW 2014 Eileen Bau Frisco Liberty High School, 10 th Grade DECA International Development Career Conference (2013 and 2014) 1 st Place Editor/Head of Communications (LHS Key Club) Grand Champion at International

More information

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

More information

Investigation on Mandarin Broadcast News Speech Recognition

Investigation on Mandarin Broadcast News Speech Recognition Investigation on Mandarin Broadcast News Speech Recognition Mei-Yuh Hwang 1, Xin Lei 1, Wen Wang 2, Takahiro Shinozaki 1 1 Univ. of Washington, Dept. of Electrical Engineering, Seattle, WA 98195 USA 2

More information

The IRISA Text-To-Speech System for the Blizzard Challenge 2017

The IRISA Text-To-Speech System for the Blizzard Challenge 2017 The IRISA Text-To-Speech System for the Blizzard Challenge 2017 Pierre Alain, Nelly Barbot, Jonathan Chevelu, Gwénolé Lecorvé, Damien Lolive, Claude Simon, Marie Tahon IRISA, University of Rennes 1 (ENSSAT),

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT Takuya Yoshioka,, Anton Ragni, Mark J. F. Gales Cambridge University Engineering Department, Cambridge, UK NTT Communication

More information

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language Z.HACHKAR 1,3, A. FARCHI 2, B.MOUNIR 1, J. EL ABBADI 3 1 Ecole Supérieure de Technologie, Safi, Morocco. zhachkar2000@yahoo.fr.

More information

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING Sheng Li 1, Xugang Lu 2, Shinsuke Sakai 1, Masato Mimura 1 and Tatsuya Kawahara 1 1 School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501,

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

Distributed Learning of Multilingual DNN Feature Extractors using GPUs Distributed Learning of Multilingual DNN Feature Extractors using GPUs Yajie Miao, Hao Zhang, Florian Metze Language Technologies Institute, School of Computer Science, Carnegie Mellon University Pittsburgh,

More information

On the Formation of Phoneme Categories in DNN Acoustic Models

On the Formation of Phoneme Categories in DNN Acoustic Models On the Formation of Phoneme Categories in DNN Acoustic Models Tasha Nagamine Department of Electrical Engineering, Columbia University T. Nagamine Motivation Large performance gap between humans and state-

More information

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition Yanzhang He, Eric Fosler-Lussier Department of Computer Science and Engineering The hio

More information

WHEN THERE IS A mismatch between the acoustic

WHEN THERE IS A mismatch between the acoustic 808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,

More information

The Current Situations of International Cooperation and Exchange and Future Expectations of Guangzhou Ploytechnic of Sports

The Current Situations of International Cooperation and Exchange and Future Expectations of Guangzhou Ploytechnic of Sports The Current Situations of International Cooperation and Exchange and Future Expectations of Guangzhou Ploytechnic of Sports It plans to enroll students officially in 2015 Sports services and management

More information

Affective Classification of Generic Audio Clips using Regression Models

Affective Classification of Generic Audio Clips using Regression Models Affective Classification of Generic Audio Clips using Regression Models Nikolaos Malandrakis 1, Shiva Sundaram, Alexandros Potamianos 3 1 Signal Analysis and Interpretation Laboratory (SAIL), USC, Los

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 8, NOVEMBER 2009 1567 Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog

More information

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,

More information

Unit Selection Synthesis Using Long Non-Uniform Units and Phonemic Identity Matching

Unit Selection Synthesis Using Long Non-Uniform Units and Phonemic Identity Matching Unit Selection Synthesis Using Long Non-Uniform Units and Phonemic Identity Matching Lukas Latacz, Yuk On Kong, Werner Verhelst Department of Electronics and Informatics (ETRO) Vrie Universiteit Brussel

More information

Speaker recognition using universal background model on YOHO database

Speaker recognition using universal background model on YOHO database Aalborg University Master Thesis project Speaker recognition using universal background model on YOHO database Author: Alexandre Majetniak Supervisor: Zheng-Hua Tan May 31, 2011 The Faculties of Engineering,

More information

Xinyu Tang. Education. Research Interests. Honors and Awards. Professional Experience

Xinyu Tang. Education. Research Interests. Honors and Awards. Professional Experience Xinyu Tang Parasol Laboratory Department of Computer Science Texas A&M University, TAMU 3112 College Station, TX 77843-3112 phone:(979)847-8835 fax: (979)458-0425 email: xinyut@tamu.edu url: http://parasol.tamu.edu/people/xinyut

More information

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Prof. Ch.Srinivasa Kumar Prof. and Head of department. Electronics and communication Nalanda Institute

More information

Spoofing and countermeasures for automatic speaker verification

Spoofing and countermeasures for automatic speaker verification INTERSPEECH 2013 Spoofing and countermeasures for automatic speaker verification Nicholas Evans 1, Tomi Kinnunen 2 and Junichi Yamagishi 3,4 1 EURECOM, Sophia Antipolis, France 2 University of Eastern

More information

Digital Signal Processing: Speaker Recognition Final Report (Complete Version)

Digital Signal Processing: Speaker Recognition Final Report (Complete Version) Digital Signal Processing: Speaker Recognition Final Report (Complete Version) Xinyu Zhou, Yuxin Wu, and Tiezheng Li Tsinghua University Contents 1 Introduction 1 2 Algorithms 2 2.1 VAD..................................................

More information

Improvements to the Pruning Behavior of DNN Acoustic Models

Improvements to the Pruning Behavior of DNN Acoustic Models Improvements to the Pruning Behavior of DNN Acoustic Models Matthias Paulik Apple Inc., Infinite Loop, Cupertino, CA 954 mpaulik@apple.com Abstract This paper examines two strategies that positively influence

More information

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches Yu-Chun Wang Chun-Kai Wu Richard Tzong-Han Tsai Department of Computer Science

More information

Multiple Intelligence Theory into College Sports Option Class in the Study To Class, for Example Table Tennis

Multiple Intelligence Theory into College Sports Option Class in the Study To Class, for Example Table Tennis Multiple Intelligence Theory into College Sports Option Class in the Study ------- To Class, for Example Table Tennis LIANG Huawei School of Physical Education, Henan Polytechnic University, China, 454

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Expressive speech synthesis: a review

Expressive speech synthesis: a review Int J Speech Technol (2013) 16:237 260 DOI 10.1007/s10772-012-9180-2 Expressive speech synthesis: a review D. Govind S.R. Mahadeva Prasanna Received: 31 May 2012 / Accepted: 11 October 2012 / Published

More information

Vowel mispronunciation detection using DNN acoustic models with cross-lingual training

Vowel mispronunciation detection using DNN acoustic models with cross-lingual training INTERSPEECH 2015 Vowel mispronunciation detection using DNN acoustic models with cross-lingual training Shrikant Joshi, Nachiket Deo, Preeti Rao Department of Electrical Engineering, Indian Institute of

More information

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,

More information

Mandarin Lexical Tone Recognition: The Gating Paradigm

Mandarin Lexical Tone Recognition: The Gating Paradigm Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions 26 24th European Signal Processing Conference (EUSIPCO) Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions Emma Jokinen Department

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty Julie Medero and Mari Ostendorf Electrical Engineering Department University of Washington Seattle, WA 98195 USA {jmedero,ostendor}@uw.edu

More information

DNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS

DNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS DNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS Jonas Gehring 1 Quoc Bao Nguyen 1 Florian Metze 2 Alex Waibel 1,2 1 Interactive Systems Lab, Karlsruhe Institute of Technology;

More information

THE enormous growth of unstructured data, including

THE enormous growth of unstructured data, including INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2014, VOL. 60, NO. 4, PP. 321 326 Manuscript received September 1, 2014; revised December 2014. DOI: 10.2478/eletel-2014-0042 Deep Image Features in

More information

arxiv: v1 [cs.lg] 7 Apr 2015

arxiv: v1 [cs.lg] 7 Apr 2015 Transferring Knowledge from a RNN to a DNN William Chan 1, Nan Rosemary Ke 1, Ian Lane 1,2 Carnegie Mellon University 1 Electrical and Computer Engineering, 2 Language Technologies Institute Equal contribution

More information

Identification of Opinion Leaders Using Text Mining Technique in Virtual Community

Identification of Opinion Leaders Using Text Mining Technique in Virtual Community Identification of Opinion Leaders Using Text Mining Technique in Virtual Community Chihli Hung Department of Information Management Chung Yuan Christian University Taiwan 32023, R.O.C. chihli@cycu.edu.tw

More information

National Taiwan Normal University - List of Presidents

National Taiwan Normal University - List of Presidents National Taiwan Normal University - List of Presidents 1st Chancellor Li Ji-gu (Term of Office: 1946.5 ~1948.6) Chancellor Li Ji-gu (1895-1968), former name Zong Wu, from Zhejiang, Shaoxing. Graduated

More information

Speaker Identification by Comparison of Smart Methods. Abstract

Speaker Identification by Comparison of Smart Methods. Abstract Journal of mathematics and computer science 10 (2014), 61-71 Speaker Identification by Comparison of Smart Methods Ali Mahdavi Meimand Amin Asadi Majid Mohamadi Department of Electrical Department of Computer

More information

Application of Visualization Technology in Professional Teaching

Application of Visualization Technology in Professional Teaching Application of Visualization Technology in Professional Teaching LI Baofu, SONG Jiayong School of Energy Science and Engineering Henan Polytechnic University, P. R. China, 454000 libf@hpu.edu.cn Abstract:

More information

Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews

Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews Kang Liu, Liheng Xu and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy

More information

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Alistair Conkie AT&T abs - Research 180 Park Avenue, Florham Park,

More information

Speaker Recognition. Speaker Diarization and Identification

Speaker Recognition. Speaker Diarization and Identification Speaker Recognition Speaker Diarization and Identification A dissertation submitted to the University of Manchester for the degree of Master of Science in the Faculty of Engineering and Physical Sciences

More information

Speech Translation for Triage of Emergency Phonecalls in Minority Languages

Speech Translation for Triage of Emergency Phonecalls in Minority Languages Speech Translation for Triage of Emergency Phonecalls in Minority Languages Udhyakumar Nallasamy, Alan W Black, Tanja Schultz, Robert Frederking Language Technologies Institute Carnegie Mellon University

More information

Empirical research on implementation of full English teaching mode in the professional courses of the engineering doctoral students

Empirical research on implementation of full English teaching mode in the professional courses of the engineering doctoral students Empirical research on implementation of full English teaching mode in the professional courses of the engineering doctoral students Yunxia Zhang & Li Li College of Electronics and Information Engineering,

More information

A Reinforcement Learning Variant for Control Scheduling

A Reinforcement Learning Variant for Control Scheduling A Reinforcement Learning Variant for Control Scheduling Aloke Guha Honeywell Sensor and System Development Center 3660 Technology Drive Minneapolis MN 55417 Abstract We present an algorithm based on reinforcement

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

Mining Association Rules in Student s Assessment Data

Mining Association Rules in Student s Assessment Data www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama

More information

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH 2009 423 Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition George

More information

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH Don McAllaster, Larry Gillick, Francesco Scattone, Mike Newman Dragon Systems, Inc. 320 Nevada Street Newton, MA 02160

More information

BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY

BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY Sergey Levine Principal Adviser: Vladlen Koltun Secondary Adviser:

More information

OTHER RESEARCH EXPERIENCE & AFFILIATIONS

OTHER RESEARCH EXPERIENCE & AFFILIATIONS Chun-Yu Ho Department of Economics University at Albany, SUNY Email: cho@albany.edu Website: https://sites.google.com/site/chunyuho/home Version: January 2017 EDUCATION PhD. Economics, Boston University,

More information

Simulation of Multi-stage Flash (MSF) Desalination Process

Simulation of Multi-stage Flash (MSF) Desalination Process Advances in Materials Physics and Chemistry, 2012, 2, 200-205 doi:10.4236/ampc.2012.24b052 Published Online December 2012 (http://www.scirp.org/journal/ampc) Simulation of Multi-stage Flash (MSF) Desalination

More information

ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS

ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS Annamaria Mesaros 1, Toni Heittola 1, Antti Eronen 2, Tuomas Virtanen 1 1 Department of Signal Processing Tampere University of Technology Korkeakoulunkatu

More information

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription Wilny Wilson.P M.Tech Computer Science Student Thejus Engineering College Thrissur, India. Sindhu.S Computer

More information

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode Diploma Thesis of Michael Heck At the Department of Informatics Karlsruhe Institute of Technology

More information

Curriculum Vitae of Chiang-Ju Chien

Curriculum Vitae of Chiang-Ju Chien Contact Information Curriculum Vitae of Chiang-Ju Chien Affiliation : Department of Electronic Engineering, Huafan University, Taiwan Address : Department of Electronic Engineering, Huafan University,

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Multi-View Features in a DNN-CRF Model for Improved Sentence Unit Detection on English Broadcast News

Multi-View Features in a DNN-CRF Model for Improved Sentence Unit Detection on English Broadcast News Multi-View Features in a DNN-CRF Model for Improved Sentence Unit Detection on English Broadcast News Guangpu Huang, Chenglin Xu, Xiong Xiao, Lei Xie, Eng Siong Chng, Haizhou Li Temasek Laboratories@NTU,

More information

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION Han Shu, I. Lee Hetherington, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge,

More information

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX, IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX, 2017 1 Small-footprint Highway Deep Neural Networks for Speech Recognition Liang Lu Member, IEEE, Steve Renals Fellow,

More information

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers October 31, 2003 Amit Juneja Department of Electrical and Computer Engineering University of Maryland, College Park,

More information

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

A comparison of spectral smoothing methods for segment concatenation based speech synthesis D.T. Chappell, J.H.L. Hansen, "Spectral Smoothing for Speech Segment Concatenation, Speech Communication, Volume 36, Issues 3-4, March 2002, Pages 343-373. A comparison of spectral smoothing methods for

More information

Wenguang Sun CAREER Award. National Science Foundation

Wenguang Sun CAREER Award. National Science Foundation Wenguang Sun Address: 401W Bridge Hall Department of Data Sciences and Operations Marshall School of Business University of Southern California Los Angeles, CA 90089-0809 Phone: (213) 740-0093 Fax: (213)

More information

Dr. Tang has been an active member of CAPA since She was Co-Chair of Education Committee and Executive committee member ( ).

Dr. Tang has been an active member of CAPA since She was Co-Chair of Education Committee and Executive committee member ( ). 2015 CAPA Candidates Profiles For President-elect (alphabetic order): Dr. Ping Tang Dr. Ping Tang is a Professor at Department of Pathology and Laboratory Medicine, University of Rochester Medical Center,

More information

Investigation and Analysis of College Students Cognition in Science and Technology Competitions

Investigation and Analysis of College Students Cognition in Science and Technology Competitions Investigation and Analysis of College Students Cognition in Science and Technology Competitions https://doi.org/10.3991/ijet.v12i07.7226 Hongwei Yue Wuyi University, Jiangmen, China Ken Cai * Zhongkai

More information