A Transfer Learning Approach for Under-Resourced Arabic Dialects Speech Recognition
|
|
- Arron Warren
- 6 years ago
- Views:
Transcription
1 A Transfer Learning Approach for Under-Resourced Arabic Dialecs Speech Recogniion Mohamed Elmahdy *, Mar Hasegawa-Johnson, Eiman Musafawi * * Qaar Universiy, Doha, Qaar Universiy of Illinois a Urbana-Champaign, USA melmahdy@ieee.org, jhasegaw@illinois.edu, eimanmus@qu.edu.qa Absrac A major problem wih dialecal Arabic speech recogniion is due o he sparsiy of speech resources. In his paper, we propose a ransfer learning framewor o joinly use large amoun of Modern Sandard Arabic (MSA) daa and lile amoun of dialecal Arabic daa o improve acousic and language modeling. We have chosen he Qaari Arabic (QA) dialec as a ypical example for an under-resourced Arabic dialec. A wide-band speech corpus has been colleced and ranscribed from several Qaari TV series and al-show programs. A large vocabulary speech recogniion baseline sysem was buil using he QA corpus. The proposed MSA-based ransfer learning echnique was performed by applying orhographic normalizaion, phone mapping, daa pooling, acousic model adapaion, and sysem combinaion. The proposed approach can achieve more han 28% relaive reducion in WER. Keywords: dialecal Arabic, acousic modeling, language modeling, adapaion, cross-lingual 1. Inroducion Arabic language is he larges sill living Semiic language in erms of he number of speaers. More han 300 million people use Arabic as heir firs naive language and i is he 6 h mos widely used language based on he number of firs language speaers. Modern Sandard Arabic (MSA) is currenly considered he formal Arabic variey across all Arabic speaers. MSA is used in news broadcas, newspapers, formal speech, boos, movies subiling, and whenever he arge audience or readers come from differen naionaliies. Pracically, MSA is no he naural spoen language for naive Arabic speaers. MSA is always a second language for all Arabic speaers. In fac, dialecal (or colloquial) Arabic is he naural spoen variey of Arabic in everyday life communicaions. A significan problem in Arabic auomaic speech recogniion (ASR) is he exisence of many differen Arabic dialecs (Egypian, Levanine, Iraqi, Gulf, ec). Every counry has is own dialec and usually here exis differen dialecs wihin he same counry. Moreover, he differen Arabic dialecs are only spoen and no formally wrien and significan phonological, morphological, synacic, and lexical differences exis beween he dialecs and he sandard form. This siuaion is called Diglossia (Ferguson, 1959). Because of he diglossic naure of dialecal Arabic, lile research has been done in dialecal Arabic ASR, or in he use of dialec in any naural language processing ass. For MSA, on he oher hand, a lo of research has been conduced. The limied research done for dialecal Arabic ASR is also due o he sparsiy of dialecal speech resources for raining differen ASR models. To acle he problem of daa sparsiy, Kirchhoff and Vergyri (2005) proposed a cross-lingual approach where hey used pooled MSA and dialecal speech daa in raining he acousic model and achieved around 3% relaive reducion in WER. Similarly, in (Huang and Hasegawa-Johnson, 2012), a join cross-lingual raining mehod based on he similariy beween phonemes in MSA and dialecal speech daa also showed improvemens in phone classificaion ass. Elmahdy e al., (2010) proposed anoher cross-lingual approach based on acousic model adapaion, which resuled in abou 12% relaive reducion in WER. Acousic model adapaion can perform beer han daa pooling when dialecal speech daa are very limied compared o exising MSA daa, and adapaion may avoid dialecal acousic feaures masing by large MSA daa as in he daa pooling approach. In he DARPA GALE projec (Mangu e al., 2011), hey have rained he acousic model using large amoun of speech daa colleced from various news channels. Evaluaion was performed on news speech and conversaional speech. Conversaional speech is mosly sponaneous and includes significan percenage of dialecal Arabic as well as MSA. However he sysem was no evaluaed or adaped wih a specific under-resourced Arabic dialec. Moreover, mos of he conversaional daa in he GALE projec are coming from new broadcass, and we have noiced ha he majoriy of speaers end o spea in MSA raher han in heir own Arabic dialec. In his paper, we have chosen Qaari Arabic (QA) 1 as a ypical example for an under-resourced Arabic dialec. Despie he huge differences beween QA and MSA, we show how o benefi from large exising MSA speech and ex resources. In he proposed framewor, MSA daa and QA daa are joinly used in raining improved acousic and language models for QA. Since ranscripion convenions may be differen beween MSA and dialecal Arabic, we show how o apply phone mapping across MSA and dialecal Arabic. In addiion, we propose o apply daa pooling followed by 1 QA is he Arabic dialec spoen in Qaar and i is a subvariey of he Gulf dialec.
2 acousic model adapaion for cross-lingual acousic modeling and inerpolaion for cross-lingual language modeling. Our assumpion is ha he conribuion of limied dialecal speech daa in a pooled acousic model depends on he raio beween MSA daa and dialecal daa. Usually, here are far more daa available in MSA han in he dialec; so we expec lile conribuion of dialecal daa o he final pooled acousic model. In order o boos he weigh of dialecal feaures, acousic model adapaion echniques are applied on he pooled acousic model using dialecal speech daa. All our experimens have been conduced wih QA in he domain of TV broadcass. The remainder of his paper is organized as follows: Secion 2 inroduces he MSA and QA speech corpora. Secion 3 and 4 presen our speech recogniion sysem and he baseline approach, respecively. Our proposed cross-lingual language modeling and acousic modeling are discussed in Secion 5 and 6, respecively. Secion 7 discusses he experimenal resuls. Secion 8 concludes his sudy. 2. Speech Corpora 2.1. Modern Sandard Arabic The MSA corpus has been colleced from he domain of news broadcas. The corpus consiss of wo speech resources from he European Language Resources Associaion (ELRA). All resources are recorded in linear PCM forma, 16 Hz, and 16 bi. The ELRA speech resources are: The NEMLAR Broadcas News Speech Corpus, which consiss of abou 40 hours from differen radio saions: Medi1, Radio Orien, Radio Mone Carlo, and Radio Television Maroc. The NeDC Arabic Broadcas News Speech Corpus, which conains abou 22.5 hours recorded from Radio Orien. Deailed composiion of he resources is shown in Table 1. Source Duraion (hrs) Radio Orien 34.6 Medi1 9.5 Radio Mone Carlo 9.0 Radio Tele. Maroc 9.3 Toal 62.4 Table 1. Composiion of he MSA speech corpus Qaari Arabic Corpus We have colleced he QA corpus from differen TV series and al show programs. Daa are seleced from programs in which he majoriy of speech segmens is in QA; segmens from each program are seleced afer audiion confirms he qualiy of he speech signal. The programs are: Tesaneef (popular Qaari series wih almos 100% in QA), Sabah El-Doha (al show wih almos 80% in QA), and some episodes from Al-Jazeerah are seleced if gues speaers are speaing Qaari dialec. The corpus is recorded in linear PCM, 16 Hz, and 16 bis. The overall lengh is 15 hours. Deailed composiion is shown in Table 2. Transcripion is performed manually in radiional Arabic orhography. Five more Persian leers are used o indicae non-sandard Arabic consonans. The leer چ denoes he /ʧ/ consonan, گ denoes /ɡ/, ڤ denoes /v/, ژ denoes /ʒ/, and پ denoes /p/. Some diacriic mars are added for ambiguous words. The following non-speech filler ags are ranscribed: pause, breah, laugh, ah, noise, and music. Speech segmenaion is done wih a 10 second maximum for each segmen delimied by filler ags. The QA corpus is divided ino a raining se of 13 hours, a developmen se of 1 hour, and an evaluaion se of 2 hours. The raining se is used eiher o rain he QA baseline acousic model or o adap exiing MSA acousic model. Source Duraion (hrs) Tesaneef series 9.3 Sabah El-Doha al show 2.0 Al-Jazeerah programs 3.7 Toal 15.0 Table 2. Composiion of he QA corpus. 3. Sysem Descripion Our sysem is a GMM-HMM archiecure based on Kaldi speech recogniion engine (Povey e al., 2011). Acousic models are all fully coninuous densiy conex-dependen ri-phones wih 3 saes per HMM rained wih Maximum Muual Informaion Esimaion (MMIE). The feaure vecor consiss of he sandard 39-dimensional MFCC coefficiens. During acousic model raining, linear discriminan analysis (LDA) and maximum lielihood linear ransform (MLLT) are applied o reduce dimensionaliy, which improves accuracy as well as recogniion speed. Feaure-space MLLR (fmllr) was used for Speaer Adapive Training (SAT) of he acousic models. The firs decoding pass uses a relaively smaller language model of around 800K n-grams. Then in he second pass, he generaed rigram laices are rescored agains a larger rigram model of around 10M n-grams. 4. Baseline Sysem 4.1. Acousic Modeling We have adoped Grapheme-based acousic modeling (also nown as graphemic modeling). Graphemic modeling is an acousic modeling approach where he phoneic ranscripion is approximaed o be he word graphemes raher han he exac phoneme sequence. Shor vowels
3 and geminaions are assumed o be implicily modeled in he acousic model (Vergyri e al., 2005; Billa e al., 2002). The baseline acousic model is rained wih he QA raining se. The opimized number of ied-saes and Gaussians mixure per sae are found o be 1000 and 8, respecively. Each grapheme leer is mapped o a unique model resuling in a oal number of 41 base unis (36 leers in he sandard Arabic alphabe and 5 Persian leers) Language Modeling The language model is a bacoff ri-gram model wih Modified Kneser-Ney smoohing. The baseline language model has been rained wih he ranscripions of he QA raining se (65K words). The vocabulary size is abou 15.5K unique words. LM raining parameers have been opimized o minimize he perplexiy of he QA developmen se. The evaluaion of he language model agains he ranscripions of he evaluaion se resuls in an OOV rae of 22.2% and a perplexiy of whils on he developmen se, i resuls in an OOV rae of 18.4% and a perplexiy of he as shown in Table 4. We could no observe any improvemen in speech recogniion accuracy by increasing he order o 4-grams, apparenly because of he limied amoun of QA raining ex ha can resul in more sparsiy in higher order n-grams Evaluaion Seings For he QA baseline sysem, bach decoding resuled in WER of 61.7% on he QA developmen se and 80.8% on he evaluaion se as shown in Table 3. By examining resuls, we find ha abou 1.0% of he errors are caused,(ا insead of أ (e.g. by eiher: he differen forms of Alef final Teh Marbua ة) insead of ه or vice versa), or final Alef Masura ى) insead of ي or vice versa). Since here is no sandard orhographic form for dialecal Arabic and hese inds of errors are already common orhographic varians in dialecal Arabic, we decide o ignore hese ypes of errors by normalizing boh hypohesis and reference, before alignmen, as follows:. ا o (أ إ آ) Normalizing all forms of Hamzaed Alef. ى o Alef Masura ي Normalizing final Yeh. ه o Heh ة Normalizing Teh Marbua Afer applying orhographic normalizaion, absolue WER decreases o 60.9% on he dev. se wih 1.3% relaive reducion and 79.9% on he eval. se wih 1.1% relaive reducion as shown in Table 3. QA Baseline + Orhographic norm. dev. 61.7% 60.9% eval. 80.8% 79.9% Table 3. Word Error Rae (WER) (%) evaluaion of he QA baseline sysem wih and wihou orhographic normalizaion on he developmen se and he evaluaion se. 5. Cross-Lingual Language Modeling In he baseline sysem, a significan percenage of errors is mainly due o he high OOV rae ha exceeds 18%. In an aemp o improve he LM, we rained a MSA rigram LM using he LDC Gigaword corpus (Parer e a., 2009) ha consiss of more han 800M words. The MSA vocabulary consiss of he op 256K words in he corpus. The evaluaion of he MSA LM resuled in a perplexiy of and on he dev. and eval. ses respecively as shown in Table 4. The OOV rae was found o be 22.3% and 22.1% on he dev. and eval. ses respecively as shown in Table 4. In order o decrease OOV, we have linearly inerpolaed boh he QA LM and he MSA LM. Inerpolaion weighs were opimized on he dev. se. The cross-lingual inerpolaion resuled in a vocabulary size of 265.7K words. OOV rae is significanly decreased o 8.9% and 9.2% on he dev. and eval. ses respecively as shown in Table 4. Perplexiy es resuled in and on he dev. and eval. ses respecively. Using he cross-lingual MSA/QA LM, bach decoding resuled in absolue WER of 56.0% and 64.4% on he dev. and eval. ses respecively wih significan relaive reducion of 3.6% and 16.3% compared o he baseline as shown in Table 5. LM Vocab. Perp. OOV (%) dev. eval. dev. eval. QA 15.5K MSA 256K QA/MSA 265.7K Table 4. Language models evaluaion wih developmen se and evaluaion se. 6. Cross-Lingual Acousic Modeling 6.1. MSA Acousic Model In his secion, we describe how o use an MSA acousic model o decode QA speech. Iniially, ha is no possible because of he mismach beween he phone ses of MSA and QA. This mismach is solved by applying phone mapping. Consonans ha do no exis in MSA have been mapped o he closes ones in MSA as follows: /ɡ/ and /ʒ/ are mapped o /ʤ/. /ʧ/ is mapped o // followed by /ʃ/. /v/ is mapped o /f/. /p/ is mapped o /b/. Afer applying QA phone mapping, a MSA graphemic acousic model is rained using he MSA 62.4 hours corpus. Decoding resuls are an absolue WER of 61.9% and 81.3% on he dev. and eval. ses respecively wih 1.6% and 1.8 relaive increase compared o he QA baseline as shown in Table 5. This relaive increase is expeced as he MSA acousic model does no ye cover all QA dialec specific feaures.
4 6.2. Daa Pooling In daa pooling acousic modeling, we have joinly rained he acousic model using boh QA and MSA daa. Decoding resuls are an absolue WER of 56.6% and 64.4% on he dev. and eval. ses respecively ouperforming he baseline by a relaive decrease of 7.1% and 19.4% as shown in Table Acousic Model Adapaion In his secion, we apply acousic model adapaion echniques on he MSA model using QA speech Daa. Maximum Lielihood Linear Regression (MLLR) (Leggeer and Woodland, 1995) followed by Maximum A- Poseriori (MAP) re-esimaion (Lee and Gauvain, 1993) is applied. Decoding resuls are an absolue WER of 57.3% and 65.9% on he dev. and eval. ses respecively ouperforming he baseline by a relaive decrease of 5.9% and 17.5% as shown in Table Combined Daa Pooling and Acousic Model Adapaion Daa pooling and acousic model adapaion have been combined in his secion. Acousic model adapaion is applied on he MSA/QA pooled model raher han he MSA model. Decoding resuls are an absolue WER of 55.6% and 62.5% on he dev. and eval. ses respecively ouperforming he baseline by a significan relaive decrease of 8.7% and 21.8% as shown in Table Sysem Combinaion In his secion, we combine differen sysems o furher improve accuracy using Minimum Bayes-Ris (MBR) decoding (Goel and Byrne, 2000). MBR is applied on he generaed laices from he wo sysems: 1. QA AM (sys. 1 in Table 5). 2. QA/MSA pool/adap AM. (sys. 5 in Table 5). In boh sysems, he QA/MSA inerpolaed LM is used. Sysem combinaion using laice MBR resuled in an absolue WER of 47.9% and 56.8% on he dev. and eval. ses respecively ouperforming he baseline sysem by a relaive decrease of 21.3% and 28.9% as shown in Table 5. sys. AM dev. eval QA MSA QA/MSA pool QA/MSA adap QA/MSA pool/adap 1+5 MBR Table 5. WER on QA dev. and eval. ses using QA/MSA LM and various acousic models configuraions. The sraegy of daa pooling, followed by MLLR+MAP adapaion, is equivalen o a ype of ieraive ransformaion and adapive re-weighing of he QA relaive o h he MSA daa. For example, he mean vecor of he Gaussian, compued by he final sage of MAP adapaion, is given by T ( ) x A 1 T, (1) ( ) 1 where x, 1 T, is a dialecal feaure vecor, h () is he poserior probabiliy of he Gaussian h given x, is he weigh of he prior, is he mean prior o adapaion, and A is he corresponding MLLR ransformaion. Bu noice ha, in urn, is given by T S T S 1 ( ) x, N ( ) x, N 1 1 (2) where x, for T 1 T S, is an MSA feaure vecor, and () is he weighing coefficien compued during he las round of maximum-lielihood EM raining applied o he pooled MSA and QA daases. By combining Eq. (1) and (2), we discover ha MAP adapaion is similar o an adapive re-weighing scheme, such ha QA feaure vecors are weighed comparably o MSA feaure vecors during he iniial EM raining, hen ransformed by A, and hen re-weighed o an increased final weigh of N ( ) ( ). The effecive weigh of each MSA daum is similarly decreased, during MAP adapaion, o only (). The effec of his ieraive sraegy is o give greaer weigh o MSA daa during he iniial raining of he model, when he MSA daa may be useful o help he learning algorihm avoid spurious local opima in he lielihood funcion; afer he model parameers have converged o a soluion ha is opimal for he pooled MSA+QA daa, hen MLLR improves he represenaion of QA daa, and, finally, MAP is used o increase he relaive imporance of QA daa in he final raining crierion. 7. Discussion Even hough he differences beween MSA and Arabic dialecs are large, o he exen ha we can consider Arabic dialecs as oally differen languages (Ferguson, 1959), we can sill benefi from MSA speech resources o improve dialecal Arabic speech recogniion. The performance of he daa pooling approach may be affeced by he raio of dialecal daa amoun o MSA daa amoun. In our case, he daa pooling approach resuls in an absolue WER of 56.0% on dev. se and 64.4% on eval. se. MSA daa amoun is abou five imes he amoun of dialecal daa. In order o boos he conribuion of dialecal daa, MLLR and MAP adapaions are hen applied on he pooled acousic model, effecively increasing he weigh of dialecal acousic feaures in he final cross-lingual model. The combinaion of daa pooling followed by acousic model adapaion resuls in a lower absolue
5 WER of 55.6% on dev. se and 62.5% on eval. se. Laice MBR decoding conribues in furher reducion in WER achieving 47.9% on dev. se and 56.8% on eval. se. 8. Conclusions and Fuure Wor In his paper, we propose a speech recogniion sysem for Qaari Colloquial Arabic (QA). Due o he limiaion of dialecal resources, by uilizing MSA daa, our proposed mehod, cross-dialecal phone mapping, daa pooling, acousic model adapaion, and sysem combinaion mehods, has achieved 21.3% and 28.9% relaive WER reducion on QA developmen se and evaluaion se respecively. For fuure wor, i is possible o exend curren framewor o oher dialec speech recogniion sysems. Moreover, some fuure direcions are o incorporae recen achievemens in ransfer learning and domain adapaion o furher improve he sysem performance (Pan and Yang, 2010). In addiion, he cross-lingual raining and adapaion can be bidirecional; a muli-as framewor of Arabic speech recogniion can be formulaed so ha boh MSA and dialecal recogniion performance can be enhanced simulaneously (Caruana, 1997). 9. Acnowledgmen This publicaion was made possible by a gran from he Qaar Naional Research Fund under is Naional Prioriies Research Program (NPRP) award number NPRP Is conens are solely he responsibiliy of he auhors and do no necessarily represen he official views of he Qaar Naional Research Fund. We would lie also o acnowledge he European Language Resources Associaion (ELRA) and he Linguisic Daa Consorium (LDC) for providing us wih daa resources. References Billa, J., Noamany, M., Srivasava, A., Liu, D., Sone, R., Xu, J., Mahoul, J. and Kubala, F. (2002). Audio indexing of Arabic broadcas news. Proceedings of ICASSP, vol. 1, pp Caruana, R. (1997). Mulias learning. Machine Learning, vol. 28, no. 1, pp Elmahdy, M., Gruhn, R., Miner, W. and Abdennadher, S. (2010). Cross-lingual acousic modeling for dialecal Arabic speech recogniion. Proceedings of INTER- SPEECH, pp Ferguson, C.A. (1959). Diglossia. Word, vol. 15, pp Goel, V. and Byrne, W. (2000). Minimum Bayes-Ris Auomaic Speech Recogniion. Compuer Speech and Language, 14(2), pp Huang, P.-S. and Hasegawa-Johnson, M. (2012). Crossdialecal daa ransferring for Gaussian mixure model raining in Arabic speech recogniion. Inernaional Conference on Arabic Language Processing. Kirchhoff, K. and Vergyri, D. (2005). Cross-dialecal daa sharing for acousic modeling in Arabic speech recogniion. Speech Communicaion, vol. 46(1), pp Lee, C.-H. and Gauvain, J.-L. (1993). Speaer adapaion based on MAP esimaion of HMM parameers. Proceedings of ICASSP, vol. II, pp Leggeer, C.J. and Woodland, P.C. (1995). Maximum lielihood linear regression for speaer adapaion of he parameers of coninuous densiy hidden Marov models. Compuer Speech and Language, vol. 9, pp Mangu, L., Kuo, H.-K, Chu, S., Kingsbury, B., Saon, G., Solau, H. and Biadsy, F. (2011). The IBM 2011 GALE Arabic Speech Transcripion Sysem. Proceedings of ASRU, pp NEMLAR Broadcas News Speech Corpus, ELRA caalog reference: ELRA-S0219, hp://caalog.elra.info/ produc_info.php?producs_id=874 NeDC Arabic BNSC (Broadcas News Speech Corpus), ELRA caalog reference: ELRA-S0157, hp://caalog. elra.info/produc_info.php?producs_id=13 Pan, S. J. and Yang, Q. (2010). A survey on ransfer learning. IEEE Transacions on Knowledge and Daa Engineering, vol. 22, no. 10, pp Parer, R., Graff, D., Chen, K., Kong, J., Maeda, K. (2009) Arabic Gigaword Fourh Ediion. Linguisic Daa Consorium, Pennsylvania, LDC Caalog No.: LDC2009T30, ISBN: Povey, D., Ghoshal, A., Boulianne, G., Burge, L., Glembe, O., Goel, N., Hannemann, M., Molice, P., Qian, Y., Schwarz, P., Silovsy, J., Semmer, G. and Vesely, K. (2011). The Kaldi Speech Recogniion Tooli. Proceedings of IEEE ASRU. Vergyri, D., Kirchhoff, K., Gadde, R., Solce, A. and Zheng, J. (2005). Developmen of a conversaional elephone speech recognizer for Levanine Arabic. Proceedings of INTERSPEECH, pp
Neural Network Model of the Backpropagation Algorithm
Neural Nework Model of he Backpropagaion Algorihm Rudolf Jakša Deparmen of Cyberneics and Arificial Inelligence Technical Universiy of Košice Lená 9, 4 Košice Slovakia jaksa@neuron.uke.sk Miroslav Karák
More informationChannel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices
Z. Zhang e al.: Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion in Hands-Free Voice Conrolled Devices 525 Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion
More informationFast Multi-task Learning for Query Spelling Correction
Fas Muli-ask Learning for Query Spelling Correcion Xu Sun Dep. of Saisical Science Cornell Universiy Ihaca, NY 14853 xusun@cornell.edu Anshumali Shrivasava Dep. of Compuer Science Cornell Universiy Ihaca,
More informationMore Accurate Question Answering on Freebase
More Accurae Quesion Answering on Freebase Hannah Bas, Elmar Haussmann Deparmen of Compuer Science Universiy of Freiburg 79110 Freiburg, Germany {bas, haussmann}@informaik.uni-freiburg.de ABSTRACT Real-world
More informationAn Effiecient Approach for Resource Auto-Scaling in Cloud Environments
Inernaional Journal of Elecrical and Compuer Engineering (IJECE) Vol. 6, No. 5, Ocober 2016, pp. 2415~2424 ISSN: 2088-8708, DOI: 10.11591/ijece.v6i5.10639 2415 An Effiecien Approach for Resource Auo-Scaling
More informationMyLab & Mastering Business
MyLab & Masering Business Efficacy Repor 2013 MyLab & Masering: Business Efficacy Repor 2013 Edied by Michelle D. Speckler 2013 Pearson MyAccouningLab, MyEconLab, MyFinanceLab, MyMarkeingLab, and MyOMLab
More information1 Language universals
AS LX 500 Topics: Language Uniersals Fall 2010, Sepember 21 4a. Anisymmery 1 Language uniersals Subjec-erb agreemen and order Bach (1971) discusses wh-quesions across SO and SO languages, hypohesizing:...
More informationInformation Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports
Downloaded from ascelibrary.org by Basil Sephanis on 07/13/16. Copyrigh ASCE. For personal use only; all righs reserved. Informaion Propagaion for informing Special Populaion Subgroups abou New Ground
More informationAutoregressive product of multi-frame predictions can improve the accuracy of hybrid models
Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Navdeep Jaitly 1, Vincent Vanhoucke 2, Geoffrey Hinton 1,2 1 University of Toronto 2 Google Inc. ndjaitly@cs.toronto.edu,
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationSemi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration
INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationInvestigation on Mandarin Broadcast News Speech Recognition
Investigation on Mandarin Broadcast News Speech Recognition Mei-Yuh Hwang 1, Xin Lei 1, Wen Wang 2, Takahiro Shinozaki 1 1 Univ. of Washington, Dept. of Electrical Engineering, Seattle, WA 98195 USA 2
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationSegmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition
Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition Yanzhang He, Eric Fosler-Lussier Department of Computer Science and Engineering The hio
More informationDeep Neural Network Language Models
Deep Neural Network Language Models Ebru Arısoy, Tara N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran IBM T.J. Watson Research Center Yorktown Heights, NY, 10598, USA {earisoy, tsainath, bedk, bhuvana}@us.ibm.com
More informationA NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren
A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren Speech Technology and Research Laboratory, SRI International,
More informationINVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT
INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT Takuya Yoshioka,, Anton Ragni, Mark J. F. Gales Cambridge University Engineering Department, Cambridge, UK NTT Communication
More informationDIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE
2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE Shaofei Xue 1
More informationarxiv: v1 [cs.cl] 27 Apr 2016
The IBM 2016 English Conversational Telephone Speech Recognition System George Saon, Tom Sercu, Steven Rennie and Hong-Kwang J. Kuo IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598 gsaon@us.ibm.com
More informationImprovements to the Pruning Behavior of DNN Acoustic Models
Improvements to the Pruning Behavior of DNN Acoustic Models Matthias Paulik Apple Inc., Infinite Loop, Cupertino, CA 954 mpaulik@apple.com Abstract This paper examines two strategies that positively influence
More informationA New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation
A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationDistributed Learning of Multilingual DNN Feature Extractors using GPUs
Distributed Learning of Multilingual DNN Feature Extractors using GPUs Yajie Miao, Hao Zhang, Florian Metze Language Technologies Institute, School of Computer Science, Carnegie Mellon University Pittsburgh,
More informationSEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING
SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING Sheng Li 1, Xugang Lu 2, Shinsuke Sakai 1, Masato Mimura 1 and Tatsuya Kawahara 1 1 School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501,
More informationSPEECH RECOGNITION CHALLENGE IN THE WILD: ARABIC MGB-3
SPEECH RECOGNITION CHALLENGE IN THE WILD: ARABIC MGB-3 Ahmed Ali 1,2, Stephan Vogel 1, Steve Renals 2 1 Qatar Computing Research Institute, HBKU, Doha, Qatar 2 Centre for Speech Technology Research, University
More informationImproved Hindi Broadcast ASR by Adapting the Language Model and Pronunciation Model Using A Priori Syntactic and Morphophonemic Knowledge
Improved Hindi Broadcast ASR by Adapting the Language Model and Pronunciation Model Using A Priori Syntactic and Morphophonemic Knowledge Preethi Jyothi 1, Mark Hasegawa-Johnson 1,2 1 Beckman Institute,
More informationarxiv: v1 [cs.lg] 7 Apr 2015
Transferring Knowledge from a RNN to a DNN William Chan 1, Nan Rosemary Ke 1, Ian Lane 1,2 Carnegie Mellon University 1 Electrical and Computer Engineering, 2 Language Technologies Institute Equal contribution
More informationUnvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition
Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Hua Zhang, Yun Tang, Wenju Liu and Bo Xu National Laboratory of Pattern Recognition Institute of Automation, Chinese
More informationLOW-RANK AND SPARSE SOFT TARGETS TO LEARN BETTER DNN ACOUSTIC MODELS
LOW-RANK AND SPARSE SOFT TARGETS TO LEARN BETTER DNN ACOUSTIC MODELS Pranay Dighe Afsaneh Asaei Hervé Bourlard Idiap Research Institute, Martigny, Switzerland École Polytechnique Fédérale de Lausanne (EPFL),
More informationPREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES
PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,
More informationBUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING
BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING Gábor Gosztolya 1, Tamás Grósz 1, László Tóth 1, David Imseng 2 1 MTA-SZTE Research Group on Artificial
More informationRobust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction
INTERSPEECH 2015 Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction Akihiro Abe, Kazumasa Yamamoto, Seiichi Nakagawa Department of Computer
More informationSpeech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationThe 2014 KIT IWSLT Speech-to-Text Systems for English, German and Italian
The 2014 KIT IWSLT Speech-to-Text Systems for English, German and Italian Kevin Kilgour, Michael Heck, Markus Müller, Matthias Sperber, Sebastian Stüker and Alex Waibel Institute for Anthropomatics Karlsruhe
More informationWHEN THERE IS A mismatch between the acoustic
808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,
More informationA study of speaker adaptation for DNN-based speech synthesis
A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,
More informationMulti-Lingual Text Leveling
Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency
More informationBAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass
BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION Han Shu, I. Lee Hetherington, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge,
More informationThe MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation
The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation AUTHORS AND AFFILIATIONS MSR: Xiaodong He, Jianfeng Gao, Chris Quirk, Patrick Nguyen, Arul Menezes, Robert Moore, Kristina Toutanova,
More informationLanguage Model and Grammar Extraction Variation in Machine Translation
Language Model and Grammar Extraction Variation in Machine Translation Vladimir Eidelman, Chris Dyer, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department
More informationThe A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation
2014 14th International Conference on Frontiers in Handwriting Recognition The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation Bastien Moysset,Théodore Bluche, Maxime Knibbe,
More informationMULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY
MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract
More informationArabic Orthography vs. Arabic OCR
Arabic Orthography vs. Arabic OCR Rich Heritage Challenging A Much Needed Technology Mohamed Attia Having consistently been spoken since more than 2000 years and on, Arabic is doubtlessly the oldest among
More informationSpeech Translation for Triage of Emergency Phonecalls in Minority Languages
Speech Translation for Triage of Emergency Phonecalls in Minority Languages Udhyakumar Nallasamy, Alan W Black, Tanja Schultz, Robert Frederking Language Technologies Institute Carnegie Mellon University
More informationADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION
ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION Mitchell McLaren 1, Yun Lei 1, Luciana Ferrer 2 1 Speech Technology and Research Laboratory, SRI International, California, USA 2 Departamento
More informationDivision of Arts, Humanities & Wellness Department of World Languages and Cultures. Course Syllabus اللغة والثقافة العربية ١ LAN 115
Division of Arts, Humanities & Wellness Department of World Languages and Cultures Course Syllabus Semester and Year: Course and Section number: Meeting Times: INSTRUCTOR: Office Location: Phone: Office
More informationDOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds
DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS Elliot Singer and Douglas Reynolds Massachusetts Institute of Technology Lincoln Laboratory {es,dar}@ll.mit.edu ABSTRACT
More informationThe Karlsruhe Institute of Technology Translation Systems for the WMT 2011
The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 Teresa Herrmann, Mohammed Mediani, Jan Niehues and Alex Waibel Karlsruhe Institute of Technology Karlsruhe, Germany firstname.lastname@kit.edu
More informationA Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition
A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition Abir Masmoudi 1,2, Mariem Ellouze Khemakhem 1,Yannick Estève 2, Lamia Hadrich Belguith 1 and Nizar Habash 3 (1) ANLP Research group,
More informationDNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS
DNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS Jonas Gehring 1 Quoc Bao Nguyen 1 Florian Metze 2 Alex Waibel 1,2 1 Interactive Systems Lab, Karlsruhe Institute of Technology;
More informationUsing Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing
Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing Pallavi Baljekar, Sunayana Sitaram, Prasanna Kumar Muthukumar, and Alan W Black Carnegie Mellon University,
More informationLetter-based speech synthesis
Letter-based speech synthesis Oliver Watts, Junichi Yamagishi, Simon King Centre for Speech Technology Research, University of Edinburgh, UK O.S.Watts@sms.ed.ac.uk jyamagis@inf.ed.ac.uk Simon.King@ed.ac.uk
More informationNoisy SMS Machine Translation in Low-Density Languages
Noisy SMS Machine Translation in Low-Density Languages Vladimir Eidelman, Kristy Hollingshead, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department of
More informationA Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language
A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language Z.HACHKAR 1,3, A. FARCHI 2, B.MOUNIR 1, J. EL ABBADI 3 1 Ecole Supérieure de Technologie, Safi, Morocco. zhachkar2000@yahoo.fr.
More informationExploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data
Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer
More informationWhat do Medical Students Need to Learn in Their English Classes?
ISSN - Journal of Language Teaching and Research, Vol., No., pp. 1-, May ACADEMY PUBLISHER Manufactured in Finland. doi:.0/jltr...1- What do Medical Students Need to Learn in Their English Classes? Giti
More informationCOPING WITH LANGUAGE DATA SPARSITY: SEMANTIC HEAD MAPPING OF COMPOUND WORDS
COPING WITH LANGUAGE DATA SPARSITY: SEMANTIC HEAD MAPPING OF COMPOUND WORDS Joris Pelemans 1, Kris Demuynck 2, Hugo Van hamme 1, Patrick Wambacq 1 1 Dept. ESAT, Katholieke Universiteit Leuven, Belgium
More informationReading Horizons. A Look At Linguistic Readers. Nicholas P. Criscuolo APRIL Volume 10, Issue Article 5
Reading Horizons Volume 10, Issue 3 1970 Article 5 APRIL 1970 A Look At Linguistic Readers Nicholas P. Criscuolo New Haven, Connecticut Public Schools Copyright c 1970 by the authors. Reading Horizons
More informationLikelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition Seltzer, M.L.; Raj, B.; Stern, R.M. TR2004-088 December 2004 Abstract
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationCalibration of Confidence Measures in Speech Recognition
Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE
More informationSTUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH
STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH Don McAllaster, Larry Gillick, Francesco Scattone, Mike Newman Dragon Systems, Inc. 320 Nevada Street Newton, MA 02160
More informationAutomatic Assessment of Spoken Modern Standard Arabic
Automatic Assessment of Spoken Modern Standard Arabic Jian Cheng, Jared Bernstein, Ulrike Pado, Masanori Suzuki Pearson Knowledge Technologies 299 California Ave, Palo Alto, CA 94306 jian.cheng@pearson.com
More informationLEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano
LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES Judith Gaspers and Philipp Cimiano Semantic Computing Group, CITEC, Bielefeld University {jgaspers cimiano}@cit-ec.uni-bielefeld.de ABSTRACT Semantic parsers
More informationBooks Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny
By the End of Year 8 All Essential words lists 1-7 290 words Commonly Misspelt Words-55 working out more complex, irregular, and/or ambiguous words by using strategies such as inferring the unknown from
More informationInternational Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012
Text-independent Mono and Cross-lingual Speaker Identification with the Constraint of Limited Data Nagaraja B G and H S Jayanna Department of Information Science and Engineering Siddaganga Institute of
More informationEli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology
ISCA Archive SUBJECTIVE EVALUATION FOR HMM-BASED SPEECH-TO-LIP MOVEMENT SYNTHESIS Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano Graduate School of Information Science, Nara Institute of Science & Technology
More informationProblems of the Arabic OCR: New Attitudes
Problems of the Arabic OCR: New Attitudes Prof. O.Redkin, Dr. O.Bernikova Department of Asian and African Studies, St. Petersburg State University, St Petersburg, Russia Abstract - This paper reviews existing
More informationA hybrid approach to translate Moroccan Arabic dialect
A hybrid approach to translate Moroccan Arabic dialect Ridouane Tachicart Mohammadia school of Engineers Mohamed Vth Agdal University, Rabat, Morocco tachicart@gmail.com Karim Bouzoubaa Mohammadia school
More informationVimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore, India
World of Computer Science and Information Technology Journal (WCSIT) ISSN: 2221-0741 Vol. 2, No. 1, 1-7, 2012 A Review on Challenges and Approaches Vimala.C Project Fellow, Department of Computer Science
More informationSwitchboard Language Model Improvement with Conversational Data from Gigaword
Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationClass-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification
Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,
More informationA Student s Assistant for Open e-learning
T4E 2009 Aparna Lalingar IIITB * Bangalore, India e-mail: aparna.l@iiitb.ac.in A Student s Assistant for Open e-learning Srinivasan Ramani IIITB * and HP Labs India Bangalore, India e-mail: ramanisl@vsnl.com
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationBridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models
Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Jung-Tae Lee and Sang-Bum Kim and Young-In Song and Hae-Chang Rim Dept. of Computer &
More informationUsing EEG to Improve Massive Open Online Courses Feedback Interaction
Using EEG to Improve Massive Open Online Courses Feedback Interaction Haohan Wang, Yiwei Li, Xiaobo Hu, Yucong Yang, Zhu Meng, Kai-min Chang Language Technologies Institute School of Computer Science Carnegie
More informationACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS
ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS Annamaria Mesaros 1, Toni Heittola 1, Antti Eronen 2, Tuomas Virtanen 1 1 Department of Signal Processing Tampere University of Technology Korkeakoulunkatu
More informationMalicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method
Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method Sanket S. Kalamkar and Adrish Banerjee Department of Electrical Engineering
More informationProceedings of Meetings on Acoustics
Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Speech Communication Session 2aSC: Linking Perception and Production
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationIEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX, 2017 1 Small-footprint Highway Deep Neural Networks for Speech Recognition Liang Lu Member, IEEE, Steve Renals Fellow,
More informationListening and Speaking Skills of English Language of Adolescents of Government and Private Schools
Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools Dr. Amardeep Kaur Professor, Babe Ke College of Education, Mudki, Ferozepur, Punjab Abstract The present
More informationPhonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project
Phonetic- and Speaker-Discriminant Features for Speaker Recognition by Lara Stoll Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California
More informationAtypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty
Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty Julie Medero and Mari Ostendorf Electrical Engineering Department University of Washington Seattle, WA 98195 USA {jmedero,ostendor}@uw.edu
More informationDropout improves Recurrent Neural Networks for Handwriting Recognition
2014 14th International Conference on Frontiers in Handwriting Recognition Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham,Théodore Bluche, Christopher Kermorvant, and Jérôme
More informationTEKS Correlations Proclamation 2017
and Skills (TEKS): Material Correlations to the Texas Essential Knowledge and Skills (TEKS): Material Subject Course Publisher Program Title Program ISBN TEKS Coverage (%) Chapter 114. Texas Essential
More informationUNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak
UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS Heiga Zen, Haşim Sak Google fheigazen,hasimg@google.com ABSTRACT Long short-term
More informationSpeech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers
Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers October 31, 2003 Amit Juneja Department of Electrical and Computer Engineering University of Maryland, College Park,
More informationThe 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X
The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,
More informationOnline Updating of Word Representations for Part-of-Speech Tagging
Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org
More informationGrade 4. Common Core Adoption Process. (Unpacked Standards)
Grade 4 Common Core Adoption Process (Unpacked Standards) Grade 4 Reading: Literature RL.4.1 Refer to details and examples in a text when explaining what the text says explicitly and when drawing inferences
More informationIEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH 2009 423 Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition George
More informationAUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION
JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders
More informationSpeaker recognition using universal background model on YOHO database
Aalborg University Master Thesis project Speaker recognition using universal background model on YOHO database Author: Alexandre Majetniak Supervisor: Zheng-Hua Tan May 31, 2011 The Faculties of Engineering,
More informationReducing Features to Improve Bug Prediction
Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science
More informationSCHEMA ACTIVATION IN MEMORY FOR PROSE 1. Michael A. R. Townsend State University of New York at Albany
Journal of Reading Behavior 1980, Vol. II, No. 1 SCHEMA ACTIVATION IN MEMORY FOR PROSE 1 Michael A. R. Townsend State University of New York at Albany Abstract. Forty-eight college students listened to
More informationNCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches
NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches Yu-Chun Wang Chun-Kai Wu Richard Tzong-Han Tsai Department of Computer Science
More informationUTD-CRSS Systems for 2012 NIST Speaker Recognition Evaluation
UTD-CRSS Systems for 2012 NIST Speaker Recognition Evaluation Taufiq Hasan Gang Liu Seyed Omid Sadjadi Navid Shokouhi The CRSS SRE Team John H.L. Hansen Keith W. Godin Abhinav Misra Ali Ziaei Hynek Bořil
More information