A Transfer Learning Approach for Under-Resourced Arabic Dialects Speech Recognition

Size: px
Start display at page:

Download "A Transfer Learning Approach for Under-Resourced Arabic Dialects Speech Recognition"

Transcription

1 A Transfer Learning Approach for Under-Resourced Arabic Dialecs Speech Recogniion Mohamed Elmahdy *, Mar Hasegawa-Johnson, Eiman Musafawi * * Qaar Universiy, Doha, Qaar Universiy of Illinois a Urbana-Champaign, USA melmahdy@ieee.org, jhasegaw@illinois.edu, eimanmus@qu.edu.qa Absrac A major problem wih dialecal Arabic speech recogniion is due o he sparsiy of speech resources. In his paper, we propose a ransfer learning framewor o joinly use large amoun of Modern Sandard Arabic (MSA) daa and lile amoun of dialecal Arabic daa o improve acousic and language modeling. We have chosen he Qaari Arabic (QA) dialec as a ypical example for an under-resourced Arabic dialec. A wide-band speech corpus has been colleced and ranscribed from several Qaari TV series and al-show programs. A large vocabulary speech recogniion baseline sysem was buil using he QA corpus. The proposed MSA-based ransfer learning echnique was performed by applying orhographic normalizaion, phone mapping, daa pooling, acousic model adapaion, and sysem combinaion. The proposed approach can achieve more han 28% relaive reducion in WER. Keywords: dialecal Arabic, acousic modeling, language modeling, adapaion, cross-lingual 1. Inroducion Arabic language is he larges sill living Semiic language in erms of he number of speaers. More han 300 million people use Arabic as heir firs naive language and i is he 6 h mos widely used language based on he number of firs language speaers. Modern Sandard Arabic (MSA) is currenly considered he formal Arabic variey across all Arabic speaers. MSA is used in news broadcas, newspapers, formal speech, boos, movies subiling, and whenever he arge audience or readers come from differen naionaliies. Pracically, MSA is no he naural spoen language for naive Arabic speaers. MSA is always a second language for all Arabic speaers. In fac, dialecal (or colloquial) Arabic is he naural spoen variey of Arabic in everyday life communicaions. A significan problem in Arabic auomaic speech recogniion (ASR) is he exisence of many differen Arabic dialecs (Egypian, Levanine, Iraqi, Gulf, ec). Every counry has is own dialec and usually here exis differen dialecs wihin he same counry. Moreover, he differen Arabic dialecs are only spoen and no formally wrien and significan phonological, morphological, synacic, and lexical differences exis beween he dialecs and he sandard form. This siuaion is called Diglossia (Ferguson, 1959). Because of he diglossic naure of dialecal Arabic, lile research has been done in dialecal Arabic ASR, or in he use of dialec in any naural language processing ass. For MSA, on he oher hand, a lo of research has been conduced. The limied research done for dialecal Arabic ASR is also due o he sparsiy of dialecal speech resources for raining differen ASR models. To acle he problem of daa sparsiy, Kirchhoff and Vergyri (2005) proposed a cross-lingual approach where hey used pooled MSA and dialecal speech daa in raining he acousic model and achieved around 3% relaive reducion in WER. Similarly, in (Huang and Hasegawa-Johnson, 2012), a join cross-lingual raining mehod based on he similariy beween phonemes in MSA and dialecal speech daa also showed improvemens in phone classificaion ass. Elmahdy e al., (2010) proposed anoher cross-lingual approach based on acousic model adapaion, which resuled in abou 12% relaive reducion in WER. Acousic model adapaion can perform beer han daa pooling when dialecal speech daa are very limied compared o exising MSA daa, and adapaion may avoid dialecal acousic feaures masing by large MSA daa as in he daa pooling approach. In he DARPA GALE projec (Mangu e al., 2011), hey have rained he acousic model using large amoun of speech daa colleced from various news channels. Evaluaion was performed on news speech and conversaional speech. Conversaional speech is mosly sponaneous and includes significan percenage of dialecal Arabic as well as MSA. However he sysem was no evaluaed or adaped wih a specific under-resourced Arabic dialec. Moreover, mos of he conversaional daa in he GALE projec are coming from new broadcass, and we have noiced ha he majoriy of speaers end o spea in MSA raher han in heir own Arabic dialec. In his paper, we have chosen Qaari Arabic (QA) 1 as a ypical example for an under-resourced Arabic dialec. Despie he huge differences beween QA and MSA, we show how o benefi from large exising MSA speech and ex resources. In he proposed framewor, MSA daa and QA daa are joinly used in raining improved acousic and language models for QA. Since ranscripion convenions may be differen beween MSA and dialecal Arabic, we show how o apply phone mapping across MSA and dialecal Arabic. In addiion, we propose o apply daa pooling followed by 1 QA is he Arabic dialec spoen in Qaar and i is a subvariey of he Gulf dialec.

2 acousic model adapaion for cross-lingual acousic modeling and inerpolaion for cross-lingual language modeling. Our assumpion is ha he conribuion of limied dialecal speech daa in a pooled acousic model depends on he raio beween MSA daa and dialecal daa. Usually, here are far more daa available in MSA han in he dialec; so we expec lile conribuion of dialecal daa o he final pooled acousic model. In order o boos he weigh of dialecal feaures, acousic model adapaion echniques are applied on he pooled acousic model using dialecal speech daa. All our experimens have been conduced wih QA in he domain of TV broadcass. The remainder of his paper is organized as follows: Secion 2 inroduces he MSA and QA speech corpora. Secion 3 and 4 presen our speech recogniion sysem and he baseline approach, respecively. Our proposed cross-lingual language modeling and acousic modeling are discussed in Secion 5 and 6, respecively. Secion 7 discusses he experimenal resuls. Secion 8 concludes his sudy. 2. Speech Corpora 2.1. Modern Sandard Arabic The MSA corpus has been colleced from he domain of news broadcas. The corpus consiss of wo speech resources from he European Language Resources Associaion (ELRA). All resources are recorded in linear PCM forma, 16 Hz, and 16 bi. The ELRA speech resources are: The NEMLAR Broadcas News Speech Corpus, which consiss of abou 40 hours from differen radio saions: Medi1, Radio Orien, Radio Mone Carlo, and Radio Television Maroc. The NeDC Arabic Broadcas News Speech Corpus, which conains abou 22.5 hours recorded from Radio Orien. Deailed composiion of he resources is shown in Table 1. Source Duraion (hrs) Radio Orien 34.6 Medi1 9.5 Radio Mone Carlo 9.0 Radio Tele. Maroc 9.3 Toal 62.4 Table 1. Composiion of he MSA speech corpus Qaari Arabic Corpus We have colleced he QA corpus from differen TV series and al show programs. Daa are seleced from programs in which he majoriy of speech segmens is in QA; segmens from each program are seleced afer audiion confirms he qualiy of he speech signal. The programs are: Tesaneef (popular Qaari series wih almos 100% in QA), Sabah El-Doha (al show wih almos 80% in QA), and some episodes from Al-Jazeerah are seleced if gues speaers are speaing Qaari dialec. The corpus is recorded in linear PCM, 16 Hz, and 16 bis. The overall lengh is 15 hours. Deailed composiion is shown in Table 2. Transcripion is performed manually in radiional Arabic orhography. Five more Persian leers are used o indicae non-sandard Arabic consonans. The leer چ denoes he /ʧ/ consonan, گ denoes /ɡ/, ڤ denoes /v/, ژ denoes /ʒ/, and پ denoes /p/. Some diacriic mars are added for ambiguous words. The following non-speech filler ags are ranscribed: pause, breah, laugh, ah, noise, and music. Speech segmenaion is done wih a 10 second maximum for each segmen delimied by filler ags. The QA corpus is divided ino a raining se of 13 hours, a developmen se of 1 hour, and an evaluaion se of 2 hours. The raining se is used eiher o rain he QA baseline acousic model or o adap exiing MSA acousic model. Source Duraion (hrs) Tesaneef series 9.3 Sabah El-Doha al show 2.0 Al-Jazeerah programs 3.7 Toal 15.0 Table 2. Composiion of he QA corpus. 3. Sysem Descripion Our sysem is a GMM-HMM archiecure based on Kaldi speech recogniion engine (Povey e al., 2011). Acousic models are all fully coninuous densiy conex-dependen ri-phones wih 3 saes per HMM rained wih Maximum Muual Informaion Esimaion (MMIE). The feaure vecor consiss of he sandard 39-dimensional MFCC coefficiens. During acousic model raining, linear discriminan analysis (LDA) and maximum lielihood linear ransform (MLLT) are applied o reduce dimensionaliy, which improves accuracy as well as recogniion speed. Feaure-space MLLR (fmllr) was used for Speaer Adapive Training (SAT) of he acousic models. The firs decoding pass uses a relaively smaller language model of around 800K n-grams. Then in he second pass, he generaed rigram laices are rescored agains a larger rigram model of around 10M n-grams. 4. Baseline Sysem 4.1. Acousic Modeling We have adoped Grapheme-based acousic modeling (also nown as graphemic modeling). Graphemic modeling is an acousic modeling approach where he phoneic ranscripion is approximaed o be he word graphemes raher han he exac phoneme sequence. Shor vowels

3 and geminaions are assumed o be implicily modeled in he acousic model (Vergyri e al., 2005; Billa e al., 2002). The baseline acousic model is rained wih he QA raining se. The opimized number of ied-saes and Gaussians mixure per sae are found o be 1000 and 8, respecively. Each grapheme leer is mapped o a unique model resuling in a oal number of 41 base unis (36 leers in he sandard Arabic alphabe and 5 Persian leers) Language Modeling The language model is a bacoff ri-gram model wih Modified Kneser-Ney smoohing. The baseline language model has been rained wih he ranscripions of he QA raining se (65K words). The vocabulary size is abou 15.5K unique words. LM raining parameers have been opimized o minimize he perplexiy of he QA developmen se. The evaluaion of he language model agains he ranscripions of he evaluaion se resuls in an OOV rae of 22.2% and a perplexiy of whils on he developmen se, i resuls in an OOV rae of 18.4% and a perplexiy of he as shown in Table 4. We could no observe any improvemen in speech recogniion accuracy by increasing he order o 4-grams, apparenly because of he limied amoun of QA raining ex ha can resul in more sparsiy in higher order n-grams Evaluaion Seings For he QA baseline sysem, bach decoding resuled in WER of 61.7% on he QA developmen se and 80.8% on he evaluaion se as shown in Table 3. By examining resuls, we find ha abou 1.0% of he errors are caused,(ا insead of أ (e.g. by eiher: he differen forms of Alef final Teh Marbua ة) insead of ه or vice versa), or final Alef Masura ى) insead of ي or vice versa). Since here is no sandard orhographic form for dialecal Arabic and hese inds of errors are already common orhographic varians in dialecal Arabic, we decide o ignore hese ypes of errors by normalizing boh hypohesis and reference, before alignmen, as follows:. ا o (أ إ آ) Normalizing all forms of Hamzaed Alef. ى o Alef Masura ي Normalizing final Yeh. ه o Heh ة Normalizing Teh Marbua Afer applying orhographic normalizaion, absolue WER decreases o 60.9% on he dev. se wih 1.3% relaive reducion and 79.9% on he eval. se wih 1.1% relaive reducion as shown in Table 3. QA Baseline + Orhographic norm. dev. 61.7% 60.9% eval. 80.8% 79.9% Table 3. Word Error Rae (WER) (%) evaluaion of he QA baseline sysem wih and wihou orhographic normalizaion on he developmen se and he evaluaion se. 5. Cross-Lingual Language Modeling In he baseline sysem, a significan percenage of errors is mainly due o he high OOV rae ha exceeds 18%. In an aemp o improve he LM, we rained a MSA rigram LM using he LDC Gigaword corpus (Parer e a., 2009) ha consiss of more han 800M words. The MSA vocabulary consiss of he op 256K words in he corpus. The evaluaion of he MSA LM resuled in a perplexiy of and on he dev. and eval. ses respecively as shown in Table 4. The OOV rae was found o be 22.3% and 22.1% on he dev. and eval. ses respecively as shown in Table 4. In order o decrease OOV, we have linearly inerpolaed boh he QA LM and he MSA LM. Inerpolaion weighs were opimized on he dev. se. The cross-lingual inerpolaion resuled in a vocabulary size of 265.7K words. OOV rae is significanly decreased o 8.9% and 9.2% on he dev. and eval. ses respecively as shown in Table 4. Perplexiy es resuled in and on he dev. and eval. ses respecively. Using he cross-lingual MSA/QA LM, bach decoding resuled in absolue WER of 56.0% and 64.4% on he dev. and eval. ses respecively wih significan relaive reducion of 3.6% and 16.3% compared o he baseline as shown in Table 5. LM Vocab. Perp. OOV (%) dev. eval. dev. eval. QA 15.5K MSA 256K QA/MSA 265.7K Table 4. Language models evaluaion wih developmen se and evaluaion se. 6. Cross-Lingual Acousic Modeling 6.1. MSA Acousic Model In his secion, we describe how o use an MSA acousic model o decode QA speech. Iniially, ha is no possible because of he mismach beween he phone ses of MSA and QA. This mismach is solved by applying phone mapping. Consonans ha do no exis in MSA have been mapped o he closes ones in MSA as follows: /ɡ/ and /ʒ/ are mapped o /ʤ/. /ʧ/ is mapped o // followed by /ʃ/. /v/ is mapped o /f/. /p/ is mapped o /b/. Afer applying QA phone mapping, a MSA graphemic acousic model is rained using he MSA 62.4 hours corpus. Decoding resuls are an absolue WER of 61.9% and 81.3% on he dev. and eval. ses respecively wih 1.6% and 1.8 relaive increase compared o he QA baseline as shown in Table 5. This relaive increase is expeced as he MSA acousic model does no ye cover all QA dialec specific feaures.

4 6.2. Daa Pooling In daa pooling acousic modeling, we have joinly rained he acousic model using boh QA and MSA daa. Decoding resuls are an absolue WER of 56.6% and 64.4% on he dev. and eval. ses respecively ouperforming he baseline by a relaive decrease of 7.1% and 19.4% as shown in Table Acousic Model Adapaion In his secion, we apply acousic model adapaion echniques on he MSA model using QA speech Daa. Maximum Lielihood Linear Regression (MLLR) (Leggeer and Woodland, 1995) followed by Maximum A- Poseriori (MAP) re-esimaion (Lee and Gauvain, 1993) is applied. Decoding resuls are an absolue WER of 57.3% and 65.9% on he dev. and eval. ses respecively ouperforming he baseline by a relaive decrease of 5.9% and 17.5% as shown in Table Combined Daa Pooling and Acousic Model Adapaion Daa pooling and acousic model adapaion have been combined in his secion. Acousic model adapaion is applied on he MSA/QA pooled model raher han he MSA model. Decoding resuls are an absolue WER of 55.6% and 62.5% on he dev. and eval. ses respecively ouperforming he baseline by a significan relaive decrease of 8.7% and 21.8% as shown in Table Sysem Combinaion In his secion, we combine differen sysems o furher improve accuracy using Minimum Bayes-Ris (MBR) decoding (Goel and Byrne, 2000). MBR is applied on he generaed laices from he wo sysems: 1. QA AM (sys. 1 in Table 5). 2. QA/MSA pool/adap AM. (sys. 5 in Table 5). In boh sysems, he QA/MSA inerpolaed LM is used. Sysem combinaion using laice MBR resuled in an absolue WER of 47.9% and 56.8% on he dev. and eval. ses respecively ouperforming he baseline sysem by a relaive decrease of 21.3% and 28.9% as shown in Table 5. sys. AM dev. eval QA MSA QA/MSA pool QA/MSA adap QA/MSA pool/adap 1+5 MBR Table 5. WER on QA dev. and eval. ses using QA/MSA LM and various acousic models configuraions. The sraegy of daa pooling, followed by MLLR+MAP adapaion, is equivalen o a ype of ieraive ransformaion and adapive re-weighing of he QA relaive o h he MSA daa. For example, he mean vecor of he Gaussian, compued by he final sage of MAP adapaion, is given by T ( ) x A 1 T, (1) ( ) 1 where x, 1 T, is a dialecal feaure vecor, h () is he poserior probabiliy of he Gaussian h given x, is he weigh of he prior, is he mean prior o adapaion, and A is he corresponding MLLR ransformaion. Bu noice ha, in urn, is given by T S T S 1 ( ) x, N ( ) x, N 1 1 (2) where x, for T 1 T S, is an MSA feaure vecor, and () is he weighing coefficien compued during he las round of maximum-lielihood EM raining applied o he pooled MSA and QA daases. By combining Eq. (1) and (2), we discover ha MAP adapaion is similar o an adapive re-weighing scheme, such ha QA feaure vecors are weighed comparably o MSA feaure vecors during he iniial EM raining, hen ransformed by A, and hen re-weighed o an increased final weigh of N ( ) ( ). The effecive weigh of each MSA daum is similarly decreased, during MAP adapaion, o only (). The effec of his ieraive sraegy is o give greaer weigh o MSA daa during he iniial raining of he model, when he MSA daa may be useful o help he learning algorihm avoid spurious local opima in he lielihood funcion; afer he model parameers have converged o a soluion ha is opimal for he pooled MSA+QA daa, hen MLLR improves he represenaion of QA daa, and, finally, MAP is used o increase he relaive imporance of QA daa in he final raining crierion. 7. Discussion Even hough he differences beween MSA and Arabic dialecs are large, o he exen ha we can consider Arabic dialecs as oally differen languages (Ferguson, 1959), we can sill benefi from MSA speech resources o improve dialecal Arabic speech recogniion. The performance of he daa pooling approach may be affeced by he raio of dialecal daa amoun o MSA daa amoun. In our case, he daa pooling approach resuls in an absolue WER of 56.0% on dev. se and 64.4% on eval. se. MSA daa amoun is abou five imes he amoun of dialecal daa. In order o boos he conribuion of dialecal daa, MLLR and MAP adapaions are hen applied on he pooled acousic model, effecively increasing he weigh of dialecal acousic feaures in he final cross-lingual model. The combinaion of daa pooling followed by acousic model adapaion resuls in a lower absolue

5 WER of 55.6% on dev. se and 62.5% on eval. se. Laice MBR decoding conribues in furher reducion in WER achieving 47.9% on dev. se and 56.8% on eval. se. 8. Conclusions and Fuure Wor In his paper, we propose a speech recogniion sysem for Qaari Colloquial Arabic (QA). Due o he limiaion of dialecal resources, by uilizing MSA daa, our proposed mehod, cross-dialecal phone mapping, daa pooling, acousic model adapaion, and sysem combinaion mehods, has achieved 21.3% and 28.9% relaive WER reducion on QA developmen se and evaluaion se respecively. For fuure wor, i is possible o exend curren framewor o oher dialec speech recogniion sysems. Moreover, some fuure direcions are o incorporae recen achievemens in ransfer learning and domain adapaion o furher improve he sysem performance (Pan and Yang, 2010). In addiion, he cross-lingual raining and adapaion can be bidirecional; a muli-as framewor of Arabic speech recogniion can be formulaed so ha boh MSA and dialecal recogniion performance can be enhanced simulaneously (Caruana, 1997). 9. Acnowledgmen This publicaion was made possible by a gran from he Qaar Naional Research Fund under is Naional Prioriies Research Program (NPRP) award number NPRP Is conens are solely he responsibiliy of he auhors and do no necessarily represen he official views of he Qaar Naional Research Fund. We would lie also o acnowledge he European Language Resources Associaion (ELRA) and he Linguisic Daa Consorium (LDC) for providing us wih daa resources. References Billa, J., Noamany, M., Srivasava, A., Liu, D., Sone, R., Xu, J., Mahoul, J. and Kubala, F. (2002). Audio indexing of Arabic broadcas news. Proceedings of ICASSP, vol. 1, pp Caruana, R. (1997). Mulias learning. Machine Learning, vol. 28, no. 1, pp Elmahdy, M., Gruhn, R., Miner, W. and Abdennadher, S. (2010). Cross-lingual acousic modeling for dialecal Arabic speech recogniion. Proceedings of INTER- SPEECH, pp Ferguson, C.A. (1959). Diglossia. Word, vol. 15, pp Goel, V. and Byrne, W. (2000). Minimum Bayes-Ris Auomaic Speech Recogniion. Compuer Speech and Language, 14(2), pp Huang, P.-S. and Hasegawa-Johnson, M. (2012). Crossdialecal daa ransferring for Gaussian mixure model raining in Arabic speech recogniion. Inernaional Conference on Arabic Language Processing. Kirchhoff, K. and Vergyri, D. (2005). Cross-dialecal daa sharing for acousic modeling in Arabic speech recogniion. Speech Communicaion, vol. 46(1), pp Lee, C.-H. and Gauvain, J.-L. (1993). Speaer adapaion based on MAP esimaion of HMM parameers. Proceedings of ICASSP, vol. II, pp Leggeer, C.J. and Woodland, P.C. (1995). Maximum lielihood linear regression for speaer adapaion of he parameers of coninuous densiy hidden Marov models. Compuer Speech and Language, vol. 9, pp Mangu, L., Kuo, H.-K, Chu, S., Kingsbury, B., Saon, G., Solau, H. and Biadsy, F. (2011). The IBM 2011 GALE Arabic Speech Transcripion Sysem. Proceedings of ASRU, pp NEMLAR Broadcas News Speech Corpus, ELRA caalog reference: ELRA-S0219, hp://caalog.elra.info/ produc_info.php?producs_id=874 NeDC Arabic BNSC (Broadcas News Speech Corpus), ELRA caalog reference: ELRA-S0157, hp://caalog. elra.info/produc_info.php?producs_id=13 Pan, S. J. and Yang, Q. (2010). A survey on ransfer learning. IEEE Transacions on Knowledge and Daa Engineering, vol. 22, no. 10, pp Parer, R., Graff, D., Chen, K., Kong, J., Maeda, K. (2009) Arabic Gigaword Fourh Ediion. Linguisic Daa Consorium, Pennsylvania, LDC Caalog No.: LDC2009T30, ISBN: Povey, D., Ghoshal, A., Boulianne, G., Burge, L., Glembe, O., Goel, N., Hannemann, M., Molice, P., Qian, Y., Schwarz, P., Silovsy, J., Semmer, G. and Vesely, K. (2011). The Kaldi Speech Recogniion Tooli. Proceedings of IEEE ASRU. Vergyri, D., Kirchhoff, K., Gadde, R., Solce, A. and Zheng, J. (2005). Developmen of a conversaional elephone speech recognizer for Levanine Arabic. Proceedings of INTERSPEECH, pp

Neural Network Model of the Backpropagation Algorithm

Neural Network Model of the Backpropagation Algorithm Neural Nework Model of he Backpropagaion Algorihm Rudolf Jakša Deparmen of Cyberneics and Arificial Inelligence Technical Universiy of Košice Lená 9, 4 Košice Slovakia jaksa@neuron.uke.sk Miroslav Karák

More information

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices Z. Zhang e al.: Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion in Hands-Free Voice Conrolled Devices 525 Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion

More information

Fast Multi-task Learning for Query Spelling Correction

Fast Multi-task Learning for Query Spelling Correction Fas Muli-ask Learning for Query Spelling Correcion Xu Sun Dep. of Saisical Science Cornell Universiy Ihaca, NY 14853 xusun@cornell.edu Anshumali Shrivasava Dep. of Compuer Science Cornell Universiy Ihaca,

More information

More Accurate Question Answering on Freebase

More Accurate Question Answering on Freebase More Accurae Quesion Answering on Freebase Hannah Bas, Elmar Haussmann Deparmen of Compuer Science Universiy of Freiburg 79110 Freiburg, Germany {bas, haussmann}@informaik.uni-freiburg.de ABSTRACT Real-world

More information

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments Inernaional Journal of Elecrical and Compuer Engineering (IJECE) Vol. 6, No. 5, Ocober 2016, pp. 2415~2424 ISSN: 2088-8708, DOI: 10.11591/ijece.v6i5.10639 2415 An Effiecien Approach for Resource Auo-Scaling

More information

MyLab & Mastering Business

MyLab & Mastering Business MyLab & Masering Business Efficacy Repor 2013 MyLab & Masering: Business Efficacy Repor 2013 Edied by Michelle D. Speckler 2013 Pearson MyAccouningLab, MyEconLab, MyFinanceLab, MyMarkeingLab, and MyOMLab

More information

1 Language universals

1 Language universals AS LX 500 Topics: Language Uniersals Fall 2010, Sepember 21 4a. Anisymmery 1 Language uniersals Subjec-erb agreemen and order Bach (1971) discusses wh-quesions across SO and SO languages, hypohesizing:...

More information

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports Downloaded from ascelibrary.org by Basil Sephanis on 07/13/16. Copyrigh ASCE. For personal use only; all righs reserved. Informaion Propagaion for informing Special Populaion Subgroups abou New Ground

More information

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Navdeep Jaitly 1, Vincent Vanhoucke 2, Geoffrey Hinton 1,2 1 University of Toronto 2 Google Inc. ndjaitly@cs.toronto.edu,

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Investigation on Mandarin Broadcast News Speech Recognition

Investigation on Mandarin Broadcast News Speech Recognition Investigation on Mandarin Broadcast News Speech Recognition Mei-Yuh Hwang 1, Xin Lei 1, Wen Wang 2, Takahiro Shinozaki 1 1 Univ. of Washington, Dept. of Electrical Engineering, Seattle, WA 98195 USA 2

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition Yanzhang He, Eric Fosler-Lussier Department of Computer Science and Engineering The hio

More information

Deep Neural Network Language Models

Deep Neural Network Language Models Deep Neural Network Language Models Ebru Arısoy, Tara N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran IBM T.J. Watson Research Center Yorktown Heights, NY, 10598, USA {earisoy, tsainath, bedk, bhuvana}@us.ibm.com

More information

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren Speech Technology and Research Laboratory, SRI International,

More information

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT Takuya Yoshioka,, Anton Ragni, Mark J. F. Gales Cambridge University Engineering Department, Cambridge, UK NTT Communication

More information

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE Shaofei Xue 1

More information

arxiv: v1 [cs.cl] 27 Apr 2016

arxiv: v1 [cs.cl] 27 Apr 2016 The IBM 2016 English Conversational Telephone Speech Recognition System George Saon, Tom Sercu, Steven Rennie and Hong-Kwang J. Kuo IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598 gsaon@us.ibm.com

More information

Improvements to the Pruning Behavior of DNN Acoustic Models

Improvements to the Pruning Behavior of DNN Acoustic Models Improvements to the Pruning Behavior of DNN Acoustic Models Matthias Paulik Apple Inc., Infinite Loop, Cupertino, CA 954 mpaulik@apple.com Abstract This paper examines two strategies that positively influence

More information

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

Distributed Learning of Multilingual DNN Feature Extractors using GPUs Distributed Learning of Multilingual DNN Feature Extractors using GPUs Yajie Miao, Hao Zhang, Florian Metze Language Technologies Institute, School of Computer Science, Carnegie Mellon University Pittsburgh,

More information

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING Sheng Li 1, Xugang Lu 2, Shinsuke Sakai 1, Masato Mimura 1 and Tatsuya Kawahara 1 1 School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501,

More information

SPEECH RECOGNITION CHALLENGE IN THE WILD: ARABIC MGB-3

SPEECH RECOGNITION CHALLENGE IN THE WILD: ARABIC MGB-3 SPEECH RECOGNITION CHALLENGE IN THE WILD: ARABIC MGB-3 Ahmed Ali 1,2, Stephan Vogel 1, Steve Renals 2 1 Qatar Computing Research Institute, HBKU, Doha, Qatar 2 Centre for Speech Technology Research, University

More information

Improved Hindi Broadcast ASR by Adapting the Language Model and Pronunciation Model Using A Priori Syntactic and Morphophonemic Knowledge

Improved Hindi Broadcast ASR by Adapting the Language Model and Pronunciation Model Using A Priori Syntactic and Morphophonemic Knowledge Improved Hindi Broadcast ASR by Adapting the Language Model and Pronunciation Model Using A Priori Syntactic and Morphophonemic Knowledge Preethi Jyothi 1, Mark Hasegawa-Johnson 1,2 1 Beckman Institute,

More information

arxiv: v1 [cs.lg] 7 Apr 2015

arxiv: v1 [cs.lg] 7 Apr 2015 Transferring Knowledge from a RNN to a DNN William Chan 1, Nan Rosemary Ke 1, Ian Lane 1,2 Carnegie Mellon University 1 Electrical and Computer Engineering, 2 Language Technologies Institute Equal contribution

More information

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Hua Zhang, Yun Tang, Wenju Liu and Bo Xu National Laboratory of Pattern Recognition Institute of Automation, Chinese

More information

LOW-RANK AND SPARSE SOFT TARGETS TO LEARN BETTER DNN ACOUSTIC MODELS

LOW-RANK AND SPARSE SOFT TARGETS TO LEARN BETTER DNN ACOUSTIC MODELS LOW-RANK AND SPARSE SOFT TARGETS TO LEARN BETTER DNN ACOUSTIC MODELS Pranay Dighe Afsaneh Asaei Hervé Bourlard Idiap Research Institute, Martigny, Switzerland École Polytechnique Fédérale de Lausanne (EPFL),

More information

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,

More information

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING Gábor Gosztolya 1, Tamás Grósz 1, László Tóth 1, David Imseng 2 1 MTA-SZTE Research Group on Artificial

More information

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction INTERSPEECH 2015 Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction Akihiro Abe, Kazumasa Yamamoto, Seiichi Nakagawa Department of Computer

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

The 2014 KIT IWSLT Speech-to-Text Systems for English, German and Italian

The 2014 KIT IWSLT Speech-to-Text Systems for English, German and Italian The 2014 KIT IWSLT Speech-to-Text Systems for English, German and Italian Kevin Kilgour, Michael Heck, Markus Müller, Matthias Sperber, Sebastian Stüker and Alex Waibel Institute for Anthropomatics Karlsruhe

More information

WHEN THERE IS A mismatch between the acoustic

WHEN THERE IS A mismatch between the acoustic 808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,

More information

A study of speaker adaptation for DNN-based speech synthesis

A study of speaker adaptation for DNN-based speech synthesis A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,

More information

Multi-Lingual Text Leveling

Multi-Lingual Text Leveling Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency

More information

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION Han Shu, I. Lee Hetherington, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge,

More information

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation AUTHORS AND AFFILIATIONS MSR: Xiaodong He, Jianfeng Gao, Chris Quirk, Patrick Nguyen, Arul Menezes, Robert Moore, Kristina Toutanova,

More information

Language Model and Grammar Extraction Variation in Machine Translation

Language Model and Grammar Extraction Variation in Machine Translation Language Model and Grammar Extraction Variation in Machine Translation Vladimir Eidelman, Chris Dyer, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department

More information

The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation

The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation 2014 14th International Conference on Frontiers in Handwriting Recognition The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation Bastien Moysset,Théodore Bluche, Maxime Knibbe,

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information

Arabic Orthography vs. Arabic OCR

Arabic Orthography vs. Arabic OCR Arabic Orthography vs. Arabic OCR Rich Heritage Challenging A Much Needed Technology Mohamed Attia Having consistently been spoken since more than 2000 years and on, Arabic is doubtlessly the oldest among

More information

Speech Translation for Triage of Emergency Phonecalls in Minority Languages

Speech Translation for Triage of Emergency Phonecalls in Minority Languages Speech Translation for Triage of Emergency Phonecalls in Minority Languages Udhyakumar Nallasamy, Alan W Black, Tanja Schultz, Robert Frederking Language Technologies Institute Carnegie Mellon University

More information

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION Mitchell McLaren 1, Yun Lei 1, Luciana Ferrer 2 1 Speech Technology and Research Laboratory, SRI International, California, USA 2 Departamento

More information

Division of Arts, Humanities & Wellness Department of World Languages and Cultures. Course Syllabus اللغة والثقافة العربية ١ LAN 115

Division of Arts, Humanities & Wellness Department of World Languages and Cultures. Course Syllabus اللغة والثقافة العربية ١ LAN 115 Division of Arts, Humanities & Wellness Department of World Languages and Cultures Course Syllabus Semester and Year: Course and Section number: Meeting Times: INSTRUCTOR: Office Location: Phone: Office

More information

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS Elliot Singer and Douglas Reynolds Massachusetts Institute of Technology Lincoln Laboratory {es,dar}@ll.mit.edu ABSTRACT

More information

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 Teresa Herrmann, Mohammed Mediani, Jan Niehues and Alex Waibel Karlsruhe Institute of Technology Karlsruhe, Germany firstname.lastname@kit.edu

More information

A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition

A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition Abir Masmoudi 1,2, Mariem Ellouze Khemakhem 1,Yannick Estève 2, Lamia Hadrich Belguith 1 and Nizar Habash 3 (1) ANLP Research group,

More information

DNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS

DNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS DNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS Jonas Gehring 1 Quoc Bao Nguyen 1 Florian Metze 2 Alex Waibel 1,2 1 Interactive Systems Lab, Karlsruhe Institute of Technology;

More information

Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing

Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing Pallavi Baljekar, Sunayana Sitaram, Prasanna Kumar Muthukumar, and Alan W Black Carnegie Mellon University,

More information

Letter-based speech synthesis

Letter-based speech synthesis Letter-based speech synthesis Oliver Watts, Junichi Yamagishi, Simon King Centre for Speech Technology Research, University of Edinburgh, UK O.S.Watts@sms.ed.ac.uk jyamagis@inf.ed.ac.uk Simon.King@ed.ac.uk

More information

Noisy SMS Machine Translation in Low-Density Languages

Noisy SMS Machine Translation in Low-Density Languages Noisy SMS Machine Translation in Low-Density Languages Vladimir Eidelman, Kristy Hollingshead, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department of

More information

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language Z.HACHKAR 1,3, A. FARCHI 2, B.MOUNIR 1, J. EL ABBADI 3 1 Ecole Supérieure de Technologie, Safi, Morocco. zhachkar2000@yahoo.fr.

More information

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer

More information

What do Medical Students Need to Learn in Their English Classes?

What do Medical Students Need to Learn in Their English Classes? ISSN - Journal of Language Teaching and Research, Vol., No., pp. 1-, May ACADEMY PUBLISHER Manufactured in Finland. doi:.0/jltr...1- What do Medical Students Need to Learn in Their English Classes? Giti

More information

COPING WITH LANGUAGE DATA SPARSITY: SEMANTIC HEAD MAPPING OF COMPOUND WORDS

COPING WITH LANGUAGE DATA SPARSITY: SEMANTIC HEAD MAPPING OF COMPOUND WORDS COPING WITH LANGUAGE DATA SPARSITY: SEMANTIC HEAD MAPPING OF COMPOUND WORDS Joris Pelemans 1, Kris Demuynck 2, Hugo Van hamme 1, Patrick Wambacq 1 1 Dept. ESAT, Katholieke Universiteit Leuven, Belgium

More information

Reading Horizons. A Look At Linguistic Readers. Nicholas P. Criscuolo APRIL Volume 10, Issue Article 5

Reading Horizons. A Look At Linguistic Readers. Nicholas P. Criscuolo APRIL Volume 10, Issue Article 5 Reading Horizons Volume 10, Issue 3 1970 Article 5 APRIL 1970 A Look At Linguistic Readers Nicholas P. Criscuolo New Haven, Connecticut Public Schools Copyright c 1970 by the authors. Reading Horizons

More information

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition Seltzer, M.L.; Raj, B.; Stern, R.M. TR2004-088 December 2004 Abstract

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH Don McAllaster, Larry Gillick, Francesco Scattone, Mike Newman Dragon Systems, Inc. 320 Nevada Street Newton, MA 02160

More information

Automatic Assessment of Spoken Modern Standard Arabic

Automatic Assessment of Spoken Modern Standard Arabic Automatic Assessment of Spoken Modern Standard Arabic Jian Cheng, Jared Bernstein, Ulrike Pado, Masanori Suzuki Pearson Knowledge Technologies 299 California Ave, Palo Alto, CA 94306 jian.cheng@pearson.com

More information

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES Judith Gaspers and Philipp Cimiano Semantic Computing Group, CITEC, Bielefeld University {jgaspers cimiano}@cit-ec.uni-bielefeld.de ABSTRACT Semantic parsers

More information

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny By the End of Year 8 All Essential words lists 1-7 290 words Commonly Misspelt Words-55 working out more complex, irregular, and/or ambiguous words by using strategies such as inferring the unknown from

More information

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012 Text-independent Mono and Cross-lingual Speaker Identification with the Constraint of Limited Data Nagaraja B G and H S Jayanna Department of Information Science and Engineering Siddaganga Institute of

More information

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology ISCA Archive SUBJECTIVE EVALUATION FOR HMM-BASED SPEECH-TO-LIP MOVEMENT SYNTHESIS Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano Graduate School of Information Science, Nara Institute of Science & Technology

More information

Problems of the Arabic OCR: New Attitudes

Problems of the Arabic OCR: New Attitudes Problems of the Arabic OCR: New Attitudes Prof. O.Redkin, Dr. O.Bernikova Department of Asian and African Studies, St. Petersburg State University, St Petersburg, Russia Abstract - This paper reviews existing

More information

A hybrid approach to translate Moroccan Arabic dialect

A hybrid approach to translate Moroccan Arabic dialect A hybrid approach to translate Moroccan Arabic dialect Ridouane Tachicart Mohammadia school of Engineers Mohamed Vth Agdal University, Rabat, Morocco tachicart@gmail.com Karim Bouzoubaa Mohammadia school

More information

Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore, India

Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore, India World of Computer Science and Information Technology Journal (WCSIT) ISSN: 2221-0741 Vol. 2, No. 1, 1-7, 2012 A Review on Challenges and Approaches Vimala.C Project Fellow, Department of Computer Science

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

More information

A Student s Assistant for Open e-learning

A Student s Assistant for Open e-learning T4E 2009 Aparna Lalingar IIITB * Bangalore, India e-mail: aparna.l@iiitb.ac.in A Student s Assistant for Open e-learning Srinivasan Ramani IIITB * and HP Labs India Bangalore, India e-mail: ramanisl@vsnl.com

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Jung-Tae Lee and Sang-Bum Kim and Young-In Song and Hae-Chang Rim Dept. of Computer &

More information

Using EEG to Improve Massive Open Online Courses Feedback Interaction

Using EEG to Improve Massive Open Online Courses Feedback Interaction Using EEG to Improve Massive Open Online Courses Feedback Interaction Haohan Wang, Yiwei Li, Xiaobo Hu, Yucong Yang, Zhu Meng, Kai-min Chang Language Technologies Institute School of Computer Science Carnegie

More information

ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS

ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS Annamaria Mesaros 1, Toni Heittola 1, Antti Eronen 2, Tuomas Virtanen 1 1 Department of Signal Processing Tampere University of Technology Korkeakoulunkatu

More information

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method Sanket S. Kalamkar and Adrish Banerjee Department of Electrical Engineering

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Speech Communication Session 2aSC: Linking Perception and Production

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX, IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX, 2017 1 Small-footprint Highway Deep Neural Networks for Speech Recognition Liang Lu Member, IEEE, Steve Renals Fellow,

More information

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools Dr. Amardeep Kaur Professor, Babe Ke College of Education, Mudki, Ferozepur, Punjab Abstract The present

More information

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project Phonetic- and Speaker-Discriminant Features for Speaker Recognition by Lara Stoll Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California

More information

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty Julie Medero and Mari Ostendorf Electrical Engineering Department University of Washington Seattle, WA 98195 USA {jmedero,ostendor}@uw.edu

More information

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Dropout improves Recurrent Neural Networks for Handwriting Recognition 2014 14th International Conference on Frontiers in Handwriting Recognition Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham,Théodore Bluche, Christopher Kermorvant, and Jérôme

More information

TEKS Correlations Proclamation 2017

TEKS Correlations Proclamation 2017 and Skills (TEKS): Material Correlations to the Texas Essential Knowledge and Skills (TEKS): Material Subject Course Publisher Program Title Program ISBN TEKS Coverage (%) Chapter 114. Texas Essential

More information

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS Heiga Zen, Haşim Sak Google fheigazen,hasimg@google.com ABSTRACT Long short-term

More information

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers October 31, 2003 Amit Juneja Department of Electrical and Computer Engineering University of Maryland, College Park,

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

Online Updating of Word Representations for Part-of-Speech Tagging

Online Updating of Word Representations for Part-of-Speech Tagging Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org

More information

Grade 4. Common Core Adoption Process. (Unpacked Standards)

Grade 4. Common Core Adoption Process. (Unpacked Standards) Grade 4 Common Core Adoption Process (Unpacked Standards) Grade 4 Reading: Literature RL.4.1 Refer to details and examples in a text when explaining what the text says explicitly and when drawing inferences

More information

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH 2009 423 Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition George

More information

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders

More information

Speaker recognition using universal background model on YOHO database

Speaker recognition using universal background model on YOHO database Aalborg University Master Thesis project Speaker recognition using universal background model on YOHO database Author: Alexandre Majetniak Supervisor: Zheng-Hua Tan May 31, 2011 The Faculties of Engineering,

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

SCHEMA ACTIVATION IN MEMORY FOR PROSE 1. Michael A. R. Townsend State University of New York at Albany

SCHEMA ACTIVATION IN MEMORY FOR PROSE 1. Michael A. R. Townsend State University of New York at Albany Journal of Reading Behavior 1980, Vol. II, No. 1 SCHEMA ACTIVATION IN MEMORY FOR PROSE 1 Michael A. R. Townsend State University of New York at Albany Abstract. Forty-eight college students listened to

More information

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches Yu-Chun Wang Chun-Kai Wu Richard Tzong-Han Tsai Department of Computer Science

More information

UTD-CRSS Systems for 2012 NIST Speaker Recognition Evaluation

UTD-CRSS Systems for 2012 NIST Speaker Recognition Evaluation UTD-CRSS Systems for 2012 NIST Speaker Recognition Evaluation Taufiq Hasan Gang Liu Seyed Omid Sadjadi Navid Shokouhi The CRSS SRE Team John H.L. Hansen Keith W. Godin Abhinav Misra Ali Ziaei Hynek Bořil

More information