Investigation of Indian English Speech Recognition using CMU Sphinx
|
|
- Agnes Wiggins
- 6 years ago
- Views:
Transcription
1 Investigation of Indian English Speech Recognition using CMU Sphinx Disha Kaur Phull School of Computing Science & Engineering, VIT University Chennai Campus, Tamil Nadu, India. G. Bharadwaja Kumar School of Computing Science & Engineering, VIT University Chennai Campus, Tamil Nadu, India. Abstract- In the recent years, research on speech recognition has given much diligence to the automatic transcription of speech data such as broadcast news (BN), medical transcription, etc. Large Vocabulary Continuous Speech Recognition (LVCSR) systems have been developed successfully for Englishes (American English (AE), British English (BE), etc.) and other languages but in case of Indian English (IE), it is still at infancy stage. IE is one of the varieties of English spoken in Indian subcontinent and is largely different from the English spoken in other parts of the world. In this paper, we have presented our work on LVCSR of IE video lectures. The speech data contains video lectures on various engineering subjects given by the experts from all over India as part of the NPTEL project which comprises of 23 hours. We have used CMU Sphinx for training and decoding in our large vocabulary continuous speech recognition experiments. The results analysis instantiate that building IE acoustic model for IE speech recognition is essential due to the fact that it has given 34% less average word error rate (WER) than HUB-4 acoustic models. The average WER before and after adaptation of IE acoustic model is 38% and 31% respectively. Even though, our IE acoustic model is trained with limited training data and the corpora used for building the language models do not mimic the spoken language, the results are promising and comparable to the results reported for AE lecture recognition in the literature. Keywords: CMU Sphinx, Indian English, Lecture Recognition. Introduction Automatic Speech Recognition (ASR) is the state of art technology that allows converting speech into text, making it easier both to create and use information. The ultimate goal of ASR research is to allow a computer to recognize all words that are intelligibly spoken by any person in real-time with 100% accuracy and it should be achieved independent of vocabulary size, noise, speaker characteristics or accent. During the past few decades, drastic developments have been reported in ASR for many languages such as English, Finnish and German, etc. Recently, there is a growing interest towards large vocabulary continuous speech recognition (LVCSR) research for Indian Languages (IL). There are several works that have been carried out for the Indian languages such as Tamil, Telugu, Bengali and Hindi. However, Speech recognition work on Indian English (IE) hasn t got that much attention when compared to other languages. The languages spoken in India belong to four major language families: Indo-Aryan, Dravidian, Austro-Asiatic, and Sino- Tibetan. In accordance with India s vast population, the figures relating to languages are also very impressive. The Indian constitution has given official status to 22 Indian languages as well as English. Apart from these, there are many other languages spoken in India. Linguists believe that there are nearly 150 different languages and there are about 2000 dialects in India [1]. Here, dialect refers to the variations at all linguistic levels, i.e., vocabularies, idioms, grammars and pronunciation. Differences among dialects are mainly due to regional and social factors and these differences vary in terms of pronunciation, vocabulary and grammar [2]. Accent refers to the variety in pronunciations of a certain language and refers to the sounds that exist in a person s language [3]. The term IE is commonly used to refer to English which is spoken as a second language in India [4]. IE plays the role of lingua franca [5]. IE has a lot of distinctive pronunciations, some distinctive syntax and quite a bit of lexical variation. Any linguistic description seeking to characterize IE must take cognizance of its highly variable nature, as it comes in a range of varieties, both regional and social [6]. Indian English accents vary greatly. The pronunciation is greatly influenced by their native language and educational background. Another major reason for variation is that IE rhythm is in accordance with the rhythm of Indian languages [7] i.e. syllable-timed (the time taken to utter each syllable). But, English is known to be a stress-timed language (both syllable and word) where only certain words in a sentence or phrase are stressed and this is an important feature of Received Pronunciation (RP). Stressing syllables and words correctly is often an area of great difficulty for speakers of IE. The extent to which Indian features of pronunciation will occur in the speech of an individual varies from person to person. In [8], Peri Bhaskararao has compared Indian English with British English (BE) Pronunciation. Diphthongs in BE correspond to pure long vowels in Indian pronunciation (e.g. cake and poor pronounced as /ke:k/ and /pu:r/, respectively); the alveolar sounds /t/ and /d/ of British Received Pronunciation (BRP) pronounced as retroflex (harsher sounds); the dental fricatives θ and δ are replaced by soft th and soft d (e.g. thick is pronounced as /thik/ rather than /ik/); /v/ and /w/ in BRP are both pronounced somewhat similar to /w/ in many parts of India and they are usually merged with b in Bengali, Assamese and Oriya 4167
2 pronunciations of English. Some words that are not found in Englishes elsewhere are used in IE. These are either innovations or translations of some native words or phrases. Examples of these instances include cousin brother (for male cousin), prepone (advance or bring forward in time), and foreign-returned (returned from abroad). There are Indianisms in grammar, such as the pluralization of non-count nouns (e.g. breads, foods, advices) and the use of the present progressive for the simple present (I am knowing). In IE, there is lack of aspiration in the word-initial position: cat /k/ but not /kh/, because of the phonemic contrast between unvoiced unaspirated velar /k/ and unvoiced aspirated velar /kh/. Also some fricatives are changed into bilabial; there is a lack of interdental in IE. In this paper, we present our experiments on large vocabulary continuous speech recognition for Indian English. The Indian English speech data is extracted from the videos of NPTEL. NPTEL is a government funded project that provides E- learning through online Web and Video courses in Engineering, Science and humanities streams [9]. The vision of this project is to provide lectures from the experts from prominent educational institutions for the benefit of students in various educational institutions in different parts of India. Currently, there are lectures of 130 speakers on various subjects. The organization of the paper is as follows: Section 2 summarizes briefly about ASR work on Indian English as well as other Indian languages and also about accent based ASR works for some other languages. Section 3 describes the experimental set up and methodology for IE speech recognition. Section 4 describes our experiments and results. A Brief Survey on Indian Language Speech Recognition In India, the early work on large vocabulary speech recognition started with Hindi language around late 90 s. Samudravijaya et al. [10] proposed a speech recognition system for Hindi which follows a hierarchical approach to speech recognition. Kumar et al. [11] proposed a method large vocabulary continuous speech recognition system for Hindi based on IBM Via Voice speech recognizer. For a vocabulary size of words, the system gives a word accuracy of 75% to 95%. Gopalakrishna et al. [12] carried medium vocabulary speech recognition using Sphinx for three different languages such as Marathi, Telugu and Tamil on different environments like landline and cellphone. They have got word error rates (WER) around 20.7%, 19.4% and 15.4% on landline and 23.6%, 17.6% and 18.3% over cellphone data for Marathi, Tamil and Telugu respectively. Pratyush Banerjee et al. [13] used Hidden Markov Model (HMM) toolkit for Bengali continuous speech recognition. They obtained an average recognition rate of 76.33% for male speakers and 52.34% for female speakers. Thangarajan et al. [14] carried out experiments using triphone based models for Tamil speech recognition and achieved 88% of accuracy over limited data. They have also tried context independent syllable models [2] for Tamil speech recognition which underperformed over context dependent phone models. Lakshmi Sarada et al. [15] tried group delay based algorithm to automatically segment and label continuous speech signal into syllable-like units for Indian languages with new feature extraction technique that uses features extracted from multiple frame sizes and frame rates. They achieved recognition rates 48.7% and 45.36%, for Tamil and Telugu respectively. Ma et al. [16] classified three accents of English language recorded from three main ethnicities in Malaysia namely Malay, Chinese and Indian. They used only the statistical descriptors of Mel-bands spectral energy and neural network as the recognizer engine. They investigated these experiments on three different independent datasets of 20%, 30%, and 40% of total samples and on average 95.59% classification rate was achieved. Huang et al. [17] carried out extensive experiments to evaluate the effect of accent on speech recognition using Microsoft Mandrian speech engine for three different Mandarin accents Beijing, Shanghai and Guangdong. They found that there is about 40-50% increase in character error rate for cross-accent speech recognition. Herman Kamper et al. [18] investigated a way to combine speech data of five South African accents of English in order to improve overall speech recognition performance. Three acoustic modeling approaches such as separate accent specific models, accentindependent models and multi-accent models are considered in this work. They found that multi-accent models obtained by introducing accent-based questions in the decision tree clustering outperformed the other modeling approaches in both phone and word recognition experiments. Only, a little amount of work has been carried out in case of Indian English speech recognition. Here, we have described few ASR works that have been carried out for IE. Kulkarni et al. [19] have studied the effect of accent variability on the performance of the Indian English ASR task. They have carried out this work on LILA Indian English database using Siemens SpeechAdvance ASR server. It consists of 10 different Indian accents. They trained three different HMMs namely accent specific, accent pooled (combines all the accent specific training data) and reduced set of the accent pooled training data as part of training the ASR system. They found that the accent pooled training set performed well on phonetically rich isolated speech recognition task. Deshpande et al. [20] distinguished between AE and IE using the second and third formant frequencies of specific accent markers. A simple Gaussian Mixture Model (GMM) was used for the classification purpose. The second and third formant frequency was calculated by using LPC roots, imposing constraints on the bandwidth and the ranges of each formant. Their results show that only the formant frequencies of these accent markers are enough to achieve classification for those two accent groups. Olga et al. [21] have done experiments on acoustic phonetic analysis of vowels produced by North Indians, whose second language is English. They concluded that North Indian English is a separate variety of IE. Srikant Joshi et al. [22] observed that IE speech is better represented by Hindi speech models for vowels common to the two languages rather than by AE models. The study of Wiltshire et al. [23] revealed that both phonemic and phonological influence of native language proficient speakers accent in IE appears in segmental and supra-segmental properties of speech. The investigation of Hema et al. [24] on the sound structure of Indian English pointed that L1(native language) effect in the IE might have been reflected either on the 4168
3 incomplete acquisition of the target phonology, the influence of sociolinguistic factors on the use and the evolution of IE. Experimental Setup There are three basic steps in building our Indian English LVCSR system. They are phonetic dictionary creation, acoustic modeling and language modeling. Creating Phonetic Dictionary Since, most of the Indian languages are phonetic in nature; Grapheme to Phoneme (G2P) conversion needs only mere mapping tables & rules for the lexical representation. However, IE pronunciation varies largely from American and other English pronunciation as well as varies according to regional and educational background within India itself. Hence, phonetic dictionary creation is a non-trivial task for IE. Initially, we have manually created the phonetic dictionary that contains around words which includes words from training corpus and other most frequent words of English language. Our phonetic dictionary contains 41 phones that are specific to Indian accent. Then, we have built the basic pronunciation models on these 20K words using Sequitur G2P software which is a data-driven grapheme-to-phoneme converter based on joint-sequence models [25]. Later, we have applied these models on Link Grammar Parser s dictionary [26] to get larger pronunciation dictionary and then, manually corrected the phonetic dictionary. Then, we have rebuilt our G2P models using this larger pronunciation dictionary. Finally, we have used these G2P models for producing the pronunciation dictionary used in our speech recognition experiments. Currently, the pronunciation in this dictionary matches mostly with IE pronounced in Andhra Pradesh region, since the dictionary has been created and corrected by Telugu language speakers. In future, we plan to build the G2P models for various accents of IE. Acoustic Modeling Acoustic modeling of speech typically refers to the process of establishing statistical representations for the feature vector sequences computed from the speech waveform. In the present work, we have used SphinxTrain [27] for building the acoustic model. The overall process followed by the SphinxTrain for creating the Acoustic Models is shown in Fig.1. Figure 1: The process involved in acoustic modeling NPTEL lecture videos have been used for building the Indian English acoustic models as shown in Fig.2. The video lectures contain various topics from science and engineering spoken lectures from IIT s and other premier institutes. These speakers are from various regions of India and they have spoken various accents of Indian English. We have considered 75 speakers lecture videos for transcribing the speech in order to train the acoustic model. The data has been video recorded in 44 khz sampling frequency and we converted them into wav format and down-sampled it to 16 khz and 16 bit mono file format. Then, we have manually transcribed the audio files. We have considered a minimum of 15 minutes of speech for each speaker while transcribing. The total speech data comprises to 23 hours. Mel frequency cepstral coefficients and their derivatives have been used as features. Then, we have built tri-state context dependent HMMs for each phone. After several experiments, we decided to have the number of Gaussians in GMM modeling as 32 and the number of Senones [28] considered for decision tree clustering is This is due to the fact that our speech data comprises only 23 hours and the vocabulary is also limited. Figure 2: The process followed for creating IE acoustic model. Adaptation The goal of adaptation techniques is to flex speaker independent models into speaker dependent using adequate data needed for full speaker-dependent training. Many stateof-the-art LVCSR systems use speaker-adapted models to improve robustness with respect to speaker variability. The HMM models of the ASR systems are adapted using Maximum Likelihood Linear Regression (MLLR) [29]. It transforms speaker independent models to the speaker dependent by capturing information specific to the speaker. MLLR adapts the observation probability of a HMM in a parametric way by finding a transform that maximizes the likelihood of the adaptation data given the transformed Gaussian parameters. 4169
4 Language Modeling Language models help any speech recognizer to figure out how likely a word sequence is independent of the acoustics. Furthermore, language models play a vital role in resolving acoustic confusions that arise due to co-articulation, assimilation and homophones while decoding. In addition, continuous speech recognition suffers from difficulties such as variation due to sentence structure (prosodies), interaction between adjacent words (crossword co-articulation), and no clear acoustic markers to delineate word boundaries. Hence, language models play a paramount role in guiding and constraining among large number of alternative word hypotheses in continuous speech recognition. N-gram language model is still the predominant choice in state-of-theart speech recognizers. Typically, N-gram models for large vocabulary speech recognizers are trained on hundreds of millions or billions of word strings. In constructing such kind of models, we usually face two problems. Firstly, the large amount of training data can lead to larger N-gram language model which consequently leads to excessively large hypothesis search space. Secondly, to train a domain specific model, we must deal with the data sparseness problem, because large amount of domain specific data are not available. Language modeling of speech extracted from lecture videos suffers due to inadequate training data, since the main source for such kind of text is audio transcriptions. In general, texts downloaded from web which is most often a primary source for collecting large amount of possible training data is not representative for the language encountered in lecture videos. Unfortunately, collecting large amounts of lecture videos and producing detailed transcriptions is very tedious. Also, the lecture speeches may contain dis-fluencies such as filled pause, repetition, and false start. In addition to dis-fluencies, there may be ungrammaticality and a language register different from the one that can be found in written texts. Even some speakers may use crutch words and foreign words within the lectures or during the conversations. In the present work, we have engineered language models (LMs) from text corpora obtained from web. Text standardization is one among the difficult tasks for building language models in case of large vocabulary speech recognition. Text must be divided into sentences. We have used a rule based sentence segmentation system for this task. All the punctuation marks, special symbols are removed except symbols associated with numerals. All the numerals are converted to orthographic type and conjointly to alpha numeric words. The abbreviations are also taken into consideration. All the words are converted into the lowercase. For the generation of the language models, three varieties of corpora have been considered. Firstly, the training transcription is used as the base variety of the language model to tally with the speech in lecture videos. Secondly, the Wikipedia dump [30] is used as the generic variety that contains words from various domains. We have downloaded the Wikipedia dump and converted into plain text using an open source tool called WP2TXT [31]. Thirdly, domain specific corpus pertaining to the lectures has been collected from the internet. Initially, we have built separate tri-gram language models for the base and topic specific corpora. Then, we have built bi-gram language model for the Wikipedia dump and we have considered most frequent first 64,000 words (words occurring more than 100 times in the corpus) to build language model by using varikn toolkit [32]. The varikn toolkit trains language models producing a compact set of high-order n-grams utilizing state-of-art Kneser-Ney smoothing. In Kneser-Ney smoothing, a lower order probability distribution is modified to take into account what is modelled by the higher order probability distributions. Hence, we have used Kneser-Ney smoothing. These three language models are merged together by using SRILM toolkit [33] as described in the Fig.3. The merged language models are used for our speech recognition tasks. Figure 3: The overall procedure for creating the Language Models. In our experiments, we have considered five different domains namely computer architecture (CA), computer networks (CN), computer organization (CO), database (DB) and operating systems (OS). The Language Models for these five different domains were generated individually for domain wise recognition. Table 1 shows the language models perplexity and Out of vocabulary (OOV) rates during the evaluation of the test transcripts. Table 1: Perplexity for the created Language Models. LM s Perplexity Words OOV (%) rates CA CN CO DB OS Decoding We have used Sphinx-4 as decoder which is freely available robust speech recognizer for our speech recognition tasks [34]. There are three primary modules in the Sphinx-4 framework [35]: the FrontEnd, the Decoder, and the Linguist. FrontEnd supports to extract features like MFCC, PLP, LPCC, etc. The Linguist translates any type of standard language model, along with pronunciation information from the dictionary and structural information from one or more sets of acoustic models, into a search graph. 4170
5 The most important component of the Decoder block is the search manager which may perform search algorithms such as frame synchronous Viterbi, A*, bi-directional, and so on. The search manager in the Decoder uses the features from the FrontEnd and the search graph from the Linguist to perform the actual decoding and for generating results. While setting the parameters in decoder s configuration file, absolute beam width, relative beam width and language weight values have been determined experimentally. Absolute beam width selects absolute amount of paths which are explored in every frame and relative beam width affects paths whose score is beam times smaller. Even though, smaller beam width speed-ups the search, we may miss potential solutions by restricting the search space. After experimentation, we have considered the absolute beam width as and relative beam width as 1E-80. Another important factor that has to be tuned during the decoding process is language weight because it decides how much relative importance will be given to the actual acoustic probabilities of the words in the hypothesis. A value between 6 and 13 is suggested as language weight [36]. A low language weight gives more leeway for words with high acoustic probabilities to be hypothesized at the risk of hypothesizing spurious words. In our experiments, we have considered language weight as 10. Experiments and Results In the initial experimentation stage, we have investigated the impact of pronunciation difference between AE, BE and IE on the speech recognition performance. This work is essential for understanding the necessity of building separate acoustic models for decoding IE speech rather than using models already available for English such as HUB-4(AE) and WSJCAM0 (BE). For this reason, we have observed the performance of the speech recognition system using HUB-4 [37], WSJCAM0 [38] and IE acoustic model (developed by us). HUB-4 corpus contains 104 hours of broadcast news collected in 1996 and 97 hours of news broadcasts collected in 1997 which is made available by Linguistic Data Consortium (LDC). In this task, we have used HUB-4 Acoustic Models trained using 140 hours of 1996 and 1997 HUB-4 training data [39]. The models are tri-state within-word and crossword triphone HMMs with no skips permitted between states. This model consists of 6000 senonically-tied states. The phone set for which models have been provided is that of the dictionary cmudict0.6d [40] available on the CMU website. British English acoustic models were trained using the WSJCAM0 corpus. The training corpus contains 90 sentences spoken by 92 speakers each. All recorded sentences were taken from the Wall Street Journal (WSJ) text corpus. All recordings were made in a quasi-soundproof room. The phone set used here is the same 40 phone set from the CMU dictionary. The total vocabulary of this corpus is around 5000 words. Results and Performance Analysis of Various English Acoustic Models We have performed the analysis on the test data that contains speech data of 14 different speakers. Even though, there are many variants of Indian English, two broad varieties of Indian English variants (North Indian and South Indian English) are considered in the present work. The test data consists of 20 minutes audio each for 4 different NPTEL video lectures. In the results, South Indian speakers of NPTEL video lectures are denoted as SI-6 and SI-7 and the North Indian speakers of NPTEL video lectures are denoted as NI-6 and NI-7. For the remaining 10 speakers, we manually recorded the speech data for the operating systems domain. The data recorded consists of five North Indian (NI-1 to NI-5) and five South Indian (SI-1 to SI-5) Speakers. The details of the test data set are shown in table 2. The total test data comprises of 3 hours. Table 2: Details of testing data set. Speakers Speech data(min) No. of Speakers SI (NPTEL) 40 2 NI (NPTEL) 40 2 SI(recorded) 50 5 NI(recorded) 50 5 In case of British English Acoustic models, the word error rates are very high (some times more than 100%) because they were trained on very small speech corpora recorded in noise free environment and hence not suitable for LVCSR experiments on Video lectures. Hence, we have considered only HUB-4 and IE acoustic models performance for comparative analysis. The comparative analysis of HUB-4 and IE acoustic models is shown in the fig.4 in terms of the WER. Figure 4: The difference in WER between HUB-4 and IE model. From the test results, one can observe that Indian English acoustic model has performed better than the HUB-4 model since there is a large difference in WER shown in fig.4. We have observed that an average WER of IE acoustic model (38%) is around 34% lesser than average WER of HUB-4 acoustic model (72%). This is because HUB-4 acoustic model was completely trained on American English accent which does not match with Indian English accent. Further, we have 4171
6 adapted HUB4 acoustic model for Indian speakers to observe any significant difference in WER after adaptation. The average WER of adapted HUB-4 acoustic model is 67% and IE acoustic model WER without adaptation is 38%. Even though, we have noticed a decrease of around 5% in WER on an average, the average WER for the adapted HUB-4 acoustic model is still not comparable to IE acoustic model without adaptation. From these results, we have concluded that building separate IE acoustic model is essential for decoding IE speech. So, we have carried out our further experiments with IE acoustic model alone. Performance Analysis of Adapted IE acoustic Model To improve the performance of the IE acoustic model, we have carried out MLLR adaptation process. The details of the data set for adapting the IE acoustic model is different from the data set used for testing and it is shown in table 3. The WER comparison of IE acoustic model before and after adaptation for all the speakers is shown in fig.5. It can be observed that after adaption, the average WER is 31% i.e. 7% less than the average WER before adaptation as shown in fig.6. Hence, we can conclude that the adaptation of the IE acoustic model helps in a better recognition of the IE lecture speech as it reduces the mismatches caused by the speaker s characteristics Figure 6: The difference in Average WER before and after adaptation. Table 3. Details of adaptation data set. Speakers Speech data(min) No. Of Speakers SI (NPTEL) 20 2 NI (NPTEL) 20 2 SI(recorded) 25 5 NI(recorded) 25 5 In fig.7, an example of our speech recognition system output is given for a better understanding of the results. From the example, one can observe the difference in WER differences for IE and HUB-4 acoustic models. It can also be understood that the lecture s transcription does not match with written language and it is very difficult to get such corpora for building the language models. Figure 7: An example IE ASR system output. Figure 5: The difference in WER before and after adaptation Comparative Analysis of North and South IE Variants Even though, the model that has been developed is referred as IE acoustic model in the present work, from [22] it is clear that IE has many variations due to different L1 inferences that leads to distinct coloration that gives rise to specific regional 4172
7 varieties of spoken. From figure 8, it can be observed that from the test data set the average WER of all SI speakers compared to the average WER of all NI speakers is 14% less without adaptation and 9% less after the adaptation of the IE acoustic model. This could be because of our pronunciation dictionary is more inclined towards the SI accent as the pronunciation model is built from the dictionary which is manually created by south Indian speakers. The result highlights the dissimilarity between the North and South Indian accents, which provides the need for multiple pronunciation dictionaries that will help in better speech recognition system for IE varieties. Figure 8: The difference in North Indian and South Indian WERs. Conclusion In the present work, we have carried out speech recognition experiments in IE video lectures In this work, we have investigated the need for building separate acoustic model for IE rather than using other existing English acoustic models such as HUB-4 (AE) or WSJCAM0 (BE) for IE speech recognition task. From the results, it is evident that IE acoustic model outperformed HUB-4 by having 34% lesser average WER for IE speech recognition. Hence, we can conclude that separate IE acoustic model is required for IE LVCSR experiments because the English pronunciation is much different from American and British English accents. Next, we have investigated the performance of our IE acoustic model on IE lecture recognition task. The average WER before and after adaptation is 38% and 31% respectively. Even though, our IE acoustic model is trained with limited training data (around 23 hours) and the corpora used for building the language models do not mimic the language spoken in the video lectures, the results are promising and comparable to the results reported for AE lecture recognition in the literature. Further, we have observed that South Indian speech is better recognized when compared to the North Indian speech. This is due to the fact that our pronunciation dictionary is inclined towards the South Indian accent. There are two possible future endeavors. One is to improve the performance of IE acoustic model by adding large vocabulary speech corpora for Indian English to the existing training set. Second is to deal with the discrepancies between variants of Indian English accents by building pronunciation models for different accents. References [1] Kavi Narayana Murthy and G Bharadwaja Kumar. Language identification from small text samples*. Journal of Quantitative Linguistics, 13(01):57-80, [2] Adrian Akmajian. Linguistics: An introduction to language and communication. MIT press, [3] Hamid Behravan. Dialect and accent recognition. PhD thesis, University of Eastern Finland, [4] John C Wells. Accents of English, volume 1. Cambridge University Press, [5] Braj B Kachru. The Indianization of English: the English language in India. Oxford University Press Oxford, [6] Andreas Sedlatschek. Contemporary Indian English: variation and change. John Benjamins Publishing, [7] Ravinder Gargesh. Indian English: Phonology. Bernd Kortmann et al. Varieties of English: Africa, South and Southeast Asia, Mouton de Gruyter,, pages , [8] Peri Bhaskararao. English in contemporary India. ABD (Asian/Pacific Book Development), 33(2002):5-7, [9] NPTEL. [10] K Samudravijaya, R Ahuja, N Bondale, T Jose, S Krishnan, P Poddar, PVS Rao, and R Raveendran. A feature-based hierarchical speech recognition system for Hindi. Sadhana (Academy Proceedings in Engineering Sciences), 23: , [11] Mohit Kumar, Nitendra Rajput, and Ashish Verma. A large-vocabulary continuous speech recognition system for Hindi. IBM journal of research and development, 48(5.6): , [12] Rohit Kumar, S Kishore, Anumanchipalli Gopalakrishna, Rahul Chitturi, Sachin Joshi, Satinder Singh, and R Sitaram. Development of Indian language speech databases for large vocabulary speech recognition systems. Proceedings of SPECOM, [13] Pratyush Banerjee, Gaurav Garg, Pabitra Mitra, and Anupam Basu. Application of triphone clustering in acoustic modeling for continuous speech recognition in Bengali. 19th International Conference on Pattern Recognition, ICPR 2008., pages 1-4, [14] R Thangarajan, AM Natarajan, and M Selvam. Word and triphone based approaches in continuous speech recognition for Tamil language. WSEAS transactions on signal processing, 4(3):76-86, [15] G Lakshmi Sarada, A Lakshmi, Hema A Murthy, and T Nagarajan. Automatic transcription of continuous speech into syllable-like units for Indian languages. Sadhana, 34(2): , [16] Y Ma, MP Paulraj, S Yaacob, AB Shahriman, and SK Nataraj. Speaker accent recognition through statistical descriptors of mel-bands spectral energy and neural network model. IEEE Conference on Sustainable Utilization and Development in Engineering and Technology, pages ,
8 [17] Chao Huang, Tao Chen, and Eric Chang. Accent issues in large vocabulary continuous speech recognition. International Journal of Speech Technology, 7(2-3): , [18] Herman Kamper, F élicien Jeje Muamba Mukanya, and Thomas Niesler. Multi-accent acoustic modelling of south African English. Speech Communication, 54(6): , [19] Kaustubh Kulkarni, Sohini Sengupta, V Ramasubramanian, Josef G Bauer, and Georg Stemmer. Accented Indian English asr: Some early results. Spoken Language Technology Workshop, 2008., pages , [20] Shamalee Deshpande, Sharat Chikkerur, and Venu Govindaraju. Accent classification in speech. Fourth IEEE Workshop on Automatic Identification Advanced Technologies., pages , [21] Olga Kalasnhnik and Janet Fletcher. An acoustic study of vowel contrasts in north Indian English. Proceedings of the XVI International Congress of Phonetic Sciences, Germany, pages , [22] Shrikant Joshi and Preeti Rao. Acoustic models for pronunciation assessment of vowels of Indian English. International Conference on O- COCOSDA/CASLRE., pages 1-6, [23] Caroline R Wiltshire and James D Harnsberger. The influence of Guajarati and Tamil l1s on Indian English: A preliminary study. World Englishes, 25(1):91-104, [24] Sirsa Hema and Redford Melissa A. The effects of native language on Indian English sounds and timing patterns. Journal of phonetics, 41(6): , [25] M. Bisani and H. Ney. Joint-sequence models for grapheme-to-phoneme conversion. Speech Communication, 50(8): , [26] Daniel DK Sleator and Davy Temperley. Parsing English with a link grammar. arxiv preprint cmplg/ , [27] SphinxTrain. /tutoria-lam. [28] Senones. concepts. [29] Mark JF Gales. Maximum likelihood linear transformations for hmm-based speech recognition. Computer speech & language, 12(2):75-98, [30] Wikipedia. download. [31] WP2TXT. [32] Vesa Siivola, Mathias Creutz, and Mikko Kurimo. Morfessor and varikn machine learning tools for speech and language technology. INTERSPEECH, [33] Andreas Stolcke and et.al. Srilm-an extensible language modeling toolkit. INTERSPEECH, [34] Willie Walker, Paul Lamere, Philip Kwok, Bhiksha Raj, Rita Singh, Evandro Gouvea, Peter Wolf, and Joe Woelfel. Sphinx-4: A flexible open source framework for speech recognition. Sun Microsystems, Inc., [35] Paul Lamere, Philip Kwok, William Walker, Evandro B Gouvea, Rita Singh, Bhiksha Raj, and Peter Wolf. Design of the cmu sphinx-4 decoder. INTERSPEECH, [36] SphinxTutorial. www. speech. cs. cmu. edu/sphinx/tutorial.html. [37] Yonghong Yan Xintian Wu Johan Schalkwyk and Ron Cole. Development of cslu lvcsr: the 1997 darpa hub4 evaluation system. complexity, 24(14):7-27, [38] Jeroen Fransen, Dave Pye, Tony Robinson, Phil Woodland, and Steve Young. Wsjcamo corpus and recording description [39 ]cmu. s/hub4opensrc jan2002/info ABOUT MODELS. [40] SphinxDictionary. -bin/cmudict. 4174
Speech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationUnvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition
Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Hua Zhang, Yun Tang, Wenju Liu and Bo Xu National Laboratory of Pattern Recognition Institute of Automation, Chinese
More informationInvestigation on Mandarin Broadcast News Speech Recognition
Investigation on Mandarin Broadcast News Speech Recognition Mei-Yuh Hwang 1, Xin Lei 1, Wen Wang 2, Takahiro Shinozaki 1 1 Univ. of Washington, Dept. of Electrical Engineering, Seattle, WA 98195 USA 2
More informationA study of speaker adaptation for DNN-based speech synthesis
A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,
More informationSpeech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationSemi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration
INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One
More informationSpeech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines
Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationLikelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition Seltzer, M.L.; Raj, B.; Stern, R.M. TR2004-088 December 2004 Abstract
More informationA New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation
A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick
More informationSpeech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers
Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers October 31, 2003 Amit Juneja Department of Electrical and Computer Engineering University of Maryland, College Park,
More informationMandarin Lexical Tone Recognition: The Gating Paradigm
Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition
More informationWHEN THERE IS A mismatch between the acoustic
808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,
More informationInternational Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012
Text-independent Mono and Cross-lingual Speaker Identification with the Constraint of Limited Data Nagaraja B G and H S Jayanna Department of Information Science and Engineering Siddaganga Institute of
More informationEdinburgh Research Explorer
Edinburgh Research Explorer Personalising speech-to-speech translation Citation for published version: Dines, J, Liang, H, Saheer, L, Gibson, M, Byrne, W, Oura, K, Tokuda, K, Yamagishi, J, King, S, Wester,
More informationThe NICT/ATR speech synthesis system for the Blizzard Challenge 2008
The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 Ranniery Maia 1,2, Jinfu Ni 1,2, Shinsuke Sakai 1,2, Tomoki Toda 1,3, Keiichi Tokuda 1,4 Tohru Shimizu 1,2, Satoshi Nakamura 1,2 1 National
More informationCalibration of Confidence Measures in Speech Recognition
Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE
More informationADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION
ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION Mitchell McLaren 1, Yun Lei 1, Luciana Ferrer 2 1 Speech Technology and Research Laboratory, SRI International, California, USA 2 Departamento
More informationBUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING
BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING Gábor Gosztolya 1, Tamás Grósz 1, László Tóth 1, David Imseng 2 1 MTA-SZTE Research Group on Artificial
More informationSpeech Recognition by Indexing and Sequencing
International Journal of Computer Information Systems and Industrial Management Applications. ISSN 215-7988 Volume 4 (212) pp. 358 365 c MIR Labs, www.mirlabs.net/ijcisim/index.html Speech Recognition
More informationCEFR Overall Illustrative English Proficiency Scales
CEFR Overall Illustrative English Proficiency s CEFR CEFR OVERALL ORAL PRODUCTION Has a good command of idiomatic expressions and colloquialisms with awareness of connotative levels of meaning. Can convey
More informationSTUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH
STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH Don McAllaster, Larry Gillick, Francesco Scattone, Mike Newman Dragon Systems, Inc. 320 Nevada Street Newton, MA 02160
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationA NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren
A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren Speech Technology and Research Laboratory, SRI International,
More informationPhonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project
Phonetic- and Speaker-Discriminant Features for Speaker Recognition by Lara Stoll Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California
More informationAnalysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion
More informationRobust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction
INTERSPEECH 2015 Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction Akihiro Abe, Kazumasa Yamamoto, Seiichi Nakagawa Department of Computer
More informationIntra-talker Variation: Audience Design Factors Affecting Lexical Selections
Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and
More informationVowel mispronunciation detection using DNN acoustic models with cross-lingual training
INTERSPEECH 2015 Vowel mispronunciation detection using DNN acoustic models with cross-lingual training Shrikant Joshi, Nachiket Deo, Preeti Rao Department of Electrical Engineering, Indian Institute of
More informationA Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many
Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.
More informationAutoregressive product of multi-frame predictions can improve the accuracy of hybrid models
Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Navdeep Jaitly 1, Vincent Vanhoucke 2, Geoffrey Hinton 1,2 1 University of Toronto 2 Google Inc. ndjaitly@cs.toronto.edu,
More informationUsing Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing
Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing Pallavi Baljekar, Sunayana Sitaram, Prasanna Kumar Muthukumar, and Alan W Black Carnegie Mellon University,
More informationLip reading: Japanese vowel recognition by tracking temporal changes of lip shape
Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,
More informationProblems of the Arabic OCR: New Attitudes
Problems of the Arabic OCR: New Attitudes Prof. O.Redkin, Dr. O.Bernikova Department of Asian and African Studies, St. Petersburg State University, St Petersburg, Russia Abstract - This paper reviews existing
More informationEnglish Language and Applied Linguistics. Module Descriptions 2017/18
English Language and Applied Linguistics Module Descriptions 2017/18 Level I (i.e. 2 nd Yr.) Modules Please be aware that all modules are subject to availability. If you have any questions about the modules,
More informationVimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore, India
World of Computer Science and Information Technology Journal (WCSIT) ISSN: 2221-0741 Vol. 2, No. 1, 1-7, 2012 A Review on Challenges and Approaches Vimala.C Project Fellow, Department of Computer Science
More informationINVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT
INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT Takuya Yoshioka,, Anton Ragni, Mark J. F. Gales Cambridge University Engineering Department, Cambridge, UK NTT Communication
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationPREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES
PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,
More informationSpeaker Identification by Comparison of Smart Methods. Abstract
Journal of mathematics and computer science 10 (2014), 61-71 Speaker Identification by Comparison of Smart Methods Ali Mahdavi Meimand Amin Asadi Majid Mohamadi Department of Electrical Department of Computer
More informationAUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION
JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders
More informationA Neural Network GUI Tested on Text-To-Phoneme Mapping
A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis
More informationEli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology
ISCA Archive SUBJECTIVE EVALUATION FOR HMM-BASED SPEECH-TO-LIP MOVEMENT SYNTHESIS Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano Graduate School of Information Science, Nara Institute of Science & Technology
More informationLetter-based speech synthesis
Letter-based speech synthesis Oliver Watts, Junichi Yamagishi, Simon King Centre for Speech Technology Research, University of Edinburgh, UK O.S.Watts@sms.ed.ac.uk jyamagis@inf.ed.ac.uk Simon.King@ed.ac.uk
More informationBAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass
BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION Han Shu, I. Lee Hetherington, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge,
More informationDeep Neural Network Language Models
Deep Neural Network Language Models Ebru Arısoy, Tara N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran IBM T.J. Watson Research Center Yorktown Heights, NY, 10598, USA {earisoy, tsainath, bedk, bhuvana}@us.ibm.com
More information1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature
1 st Grade Curriculum Map Common Core Standards Language Arts 2013 2014 1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature Key Ideas and Details
More informationOn the Formation of Phoneme Categories in DNN Acoustic Models
On the Formation of Phoneme Categories in DNN Acoustic Models Tasha Nagamine Department of Electrical Engineering, Columbia University T. Nagamine Motivation Large performance gap between humans and state-
More informationSmall-Vocabulary Speech Recognition for Resource- Scarce Languages
Small-Vocabulary Speech Recognition for Resource- Scarce Languages Fang Qiao School of Computer Science Carnegie Mellon University fqiao@andrew.cmu.edu Jahanzeb Sherwani iteleport LLC j@iteleportmobile.com
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationA Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language
A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language Z.HACHKAR 1,3, A. FARCHI 2, B.MOUNIR 1, J. EL ABBADI 3 1 Ecole Supérieure de Technologie, Safi, Morocco. zhachkar2000@yahoo.fr.
More informationFirst Grade Curriculum Highlights: In alignment with the Common Core Standards
First Grade Curriculum Highlights: In alignment with the Common Core Standards ENGLISH LANGUAGE ARTS Foundational Skills Print Concepts Demonstrate understanding of the organization and basic features
More informationPHONETIC DISTANCE BASED ACCENT CLASSIFIER TO IDENTIFY PRONUNCIATION VARIANTS AND OOV WORDS
PHONETIC DISTANCE BASED ACCENT CLASSIFIER TO IDENTIFY PRONUNCIATION VARIANTS AND OOV WORDS Akella Amarendra Babu 1 *, Ramadevi Yellasiri 2 and Akepogu Ananda Rao 3 1 JNIAS, JNT University Anantapur, Ananthapuramu,
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading
ELA/ELD Correlation Matrix for ELD Materials Grade 1 Reading The English Language Arts (ELA) required for the one hour of English-Language Development (ELD) Materials are listed in Appendix 9-A, Matrix
More informationPhonological Processing for Urdu Text to Speech System
Phonological Processing for Urdu Text to Speech System Sarmad Hussain Center for Research in Urdu Language Processing, National University of Computer and Emerging Sciences, B Block, Faisal Town, Lahore,
More informationClass-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification
Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,
More informationSegmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition
Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition Yanzhang He, Eric Fosler-Lussier Department of Computer Science and Engineering The hio
More informationProgressive Aspect in Nigerian English
ISLE 2011 17 June 2011 1 New Englishes Empirical Studies Aspect in Nigerian Languages 2 3 Nigerian English Other New Englishes Explanations Progressive Aspect in New Englishes New Englishes Empirical Studies
More informationQuickStroke: An Incremental On-line Chinese Handwriting Recognition System
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents
More informationRole of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation
Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Alistair Conkie AT&T abs - Research 180 Park Avenue, Florham Park,
More informationDesign Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm
Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Prof. Ch.Srinivasa Kumar Prof. and Head of department. Electronics and communication Nalanda Institute
More informationNoisy SMS Machine Translation in Low-Density Languages
Noisy SMS Machine Translation in Low-Density Languages Vladimir Eidelman, Kristy Hollingshead, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department of
More informationImproved Hindi Broadcast ASR by Adapting the Language Model and Pronunciation Model Using A Priori Syntactic and Morphophonemic Knowledge
Improved Hindi Broadcast ASR by Adapting the Language Model and Pronunciation Model Using A Priori Syntactic and Morphophonemic Knowledge Preethi Jyothi 1, Mark Hasegawa-Johnson 1,2 1 Beckman Institute,
More informationNCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches
NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches Yu-Chun Wang Chun-Kai Wu Richard Tzong-Han Tsai Department of Computer Science
More informationWord Stress and Intonation: Introduction
Word Stress and Intonation: Introduction WORD STRESS One or more syllables of a polysyllabic word have greater prominence than the others. Such syllables are said to be accented or stressed. Word stress
More informationSpeaker recognition using universal background model on YOHO database
Aalborg University Master Thesis project Speaker recognition using universal background model on YOHO database Author: Alexandre Majetniak Supervisor: Zheng-Hua Tan May 31, 2011 The Faculties of Engineering,
More informationThe IRISA Text-To-Speech System for the Blizzard Challenge 2017
The IRISA Text-To-Speech System for the Blizzard Challenge 2017 Pierre Alain, Nelly Barbot, Jonathan Chevelu, Gwénolé Lecorvé, Damien Lolive, Claude Simon, Marie Tahon IRISA, University of Rennes 1 (ENSSAT),
More informationAnalysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription
Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription Wilny Wilson.P M.Tech Computer Science Student Thejus Engineering College Thrissur, India. Sindhu.S Computer
More informationP. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas
Exploiting Distance Learning Methods and Multimediaenhanced instructional content to support IT Curricula in Greek Technological Educational Institutes P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou,
More informationCS 598 Natural Language Processing
CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@
More informationListening and Speaking Skills of English Language of Adolescents of Government and Private Schools
Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools Dr. Amardeep Kaur Professor, Babe Ke College of Education, Mudki, Ferozepur, Punjab Abstract The present
More informationMulti-Lingual Text Leveling
Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency
More informationIEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH 2009 423 Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition George
More informationRachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA
LANGUAGE AND SPEECH, 2009, 52 (4), 391 413 391 Variability in Word Duration as a Function of Probability, Speech Style, and Prosody Rachel E. Baker, Ann R. Bradlow Northwestern University, Evanston, IL,
More informationHoughton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)
Houghton Mifflin Reading Correlation to the Standards for English Language Arts (Grade1) 8.3 JOHNNY APPLESEED Biography TARGET SKILLS: 8.3 Johnny Appleseed Phonemic Awareness Phonics Comprehension Vocabulary
More informationAutomatic Pronunciation Checker
Institut für Technische Informatik und Kommunikationsnetze Eidgenössische Technische Hochschule Zürich Swiss Federal Institute of Technology Zurich Ecole polytechnique fédérale de Zurich Politecnico federale
More informationLanguage Model and Grammar Extraction Variation in Machine Translation
Language Model and Grammar Extraction Variation in Machine Translation Vladimir Eidelman, Chris Dyer, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationMining Association Rules in Student s Assessment Data
www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationThe Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access
The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access Joyce McDonough 1, Heike Lenhert-LeHouiller 1, Neil Bardhan 2 1 Linguistics
More informationCLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction
CLASSIFICATION OF PROGRAM Critical Elements Analysis 1 Program Name: Macmillan/McGraw Hill Reading 2003 Date of Publication: 2003 Publisher: Macmillan/McGraw Hill Reviewer Code: 1. X The program meets
More informationLinguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1
Linguistics 1 Linguistics Matthew Gordon, Chair Interdepartmental Program in the College of Arts and Science 223 Tate Hall (573) 882-6421 gordonmj@missouri.edu Kibby Smith, Advisor Office of Multidisciplinary
More informationSouth Carolina English Language Arts
South Carolina English Language Arts A S O F J U N E 2 0, 2 0 1 0, T H I S S TAT E H A D A D O P T E D T H E CO M M O N CO R E S TAT E S TA N DA R D S. DOCUMENTS REVIEWED South Carolina Academic Content
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationSegregation of Unvoiced Speech from Nonspeech Interference
Technical Report OSU-CISRC-8/7-TR63 Department of Computer Science and Engineering The Ohio State University Columbus, OH 4321-1277 FTP site: ftp.cse.ohio-state.edu Login: anonymous Directory: pub/tech-report/27
More informationThe Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh
The Effect of Discourse Markers on the Speaking Production of EFL Students Iman Moradimanesh Abstract The research aimed at investigating the relationship between discourse markers (DMs) and a special
More informationSARDNET: A Self-Organizing Feature Map for Sequences
SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationProcedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing
Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 141 ( 2014 ) 124 128 WCLTA 2013 Using Corpus Linguistics in the Development of Writing Blanka Frydrychova
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationAQUA: An Ontology-Driven Question Answering System
AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.
More informationFlorida Reading Endorsement Alignment Matrix Competency 1
Florida Reading Endorsement Alignment Matrix Competency 1 Reading Endorsement Guiding Principle: Teachers will understand and teach reading as an ongoing strategic process resulting in students comprehending
More informationChapter 5: Language. Over 6,900 different languages worldwide
Chapter 5: Language Over 6,900 different languages worldwide Language is a system of communication through speech, a collection of sounds that a group of people understands to have the same meaning Key
More informationSIE: Speech Enabled Interface for E-Learning
SIE: Speech Enabled Interface for E-Learning Shikha M.Tech Student Lovely Professional University, Phagwara, Punjab INDIA ABSTRACT In today s world, e-learning is very important and popular. E- learning
More information