Data-Driven Approach to Designing Compound Words for Continuous Speech Recognition

Size: px
Start display at page:

Download "Data-Driven Approach to Designing Compound Words for Continuous Speech Recognition"

Transcription

1 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 4, MAY Data-Driven Approach to Designing Compound Words for Continuous Speech Recognition George Saon and Mukund Padmanabhan, Senior Member, IEEE Abstract In this paper, we present a new approach to deriving compound words from a training corpus. The motivation for making compound words is because under some assumptions, speech recognition errors occur less frequently in longer words. Furthermore, they also enable more accurate modeling of pronunciation variability at the boundary between adjacent words in a continuously spoken utterance. We introduce a measure based on the product between the direct and the reverse bigram probability of a pair of words for finding candidate pairs in order to create compound words. Our experimental results show that by augmenting both the acoustic vocabulary and the language model with these new tokens, the word recognition accuracy can be improved by absolute 2.8% (7% relative) on a voic continuous speech recognition task. We also compare the proposed measure for selecting compound words with other measures that have been described in the literature. I. INTRODUCTION ONE of the observations that can be made in speech recognition systems is that short words are more frequently misrecognized. This is indicated in Fig. 1, which represents the number of errors made in all words of a specified length (length as defined by the average number of phones in the baseforms of the words). The results for this figure were obtained by decoding the training data of the voic corpus (representing 40 h of spontaneous telephone speech) in the following way. Two language models were trained, one from the transcriptions of the first 20 h (LMa) and the second from the transcriptions of the last 20 h (LMb). The first 20 h of the training data were then decoded using LMb and the last 20 h with LMa. These results are intuitively understandable in a longer phone sequence, it is necessary to make more errors in order to get the word wrong. If we consider different words in the vocabulary as sequences of phones and under the following assumptions: 1) no phone sequence in the vocabulary is a subset of any other phone sequence in the vocabulary; 2) probability of error for all phones is the same, ; 3) majority of the phones in a baseform need to be erroneously decoded for the word to be wrong, then the probability of making an error in a word with baseform of length is given by. For values of around 0.3 (which is consistent with what we observed in the training data), can be seen to decrease Manuscript received October 19, 1999; revised November 29, This work was supported in part by DARPA under Grant MDA C The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Jerome R. Bellegarda. The authors are with the IBM T. J. Watson Research Center, Yorktown Heights, NY USA ( saon@watson.ibm.com). Publisher Item Identifier S (01) Fig. 1. word). Word error rate versus word length (expressed as number of phones in as increases, implying that longer words are less frequently misrecognized (with the exception of phone lengths between six and nine where the tendency seems to be reversed). The second observation is that the pronunciation variability of words is greater in spontaneous, conversational speech compared to the case of carefully read speech where the uttered words are closer to their canonical representations (baseforms). One can argue that, by increasing the vocabulary of alternate pronunciations of words (acoustic vocabulary), most of the speech variability can be captured in the spontaneous case. However, an increase in the number of alternate pronunciations is usually followed by an increase in the confusability between words since different words can end up having close or even identical pronunciation variants. Most coarticulation effects arise at the boundary between adjacent words and result in alterations of the last phones of the first word and the first few phones of the second word. One method to model these changes is the use of crossword phonological rewriting rules as proposed in [5]; this provides a systematic way of taking into account coarticulation phenomena such as geminate or plosive deletion (e.g., WENT TO WEH N T UW), palatization (e.g., GOT YOU G AO CH AX), etc. An alternative way of dealing with coarticulation effects at word boundaries is to merge specific pairs of words into single compound words (also called multi-words [3], phrases [6], [8], [10], [11] or sticky pairs [2]) and to provide special coarticulated pronunciation variants for these new tokens. For instance, frequently occurring pairs such as KIND OF, LET ME, LET YOU can be viewed as single words (KIND-OF, LET-ME, LET-YOU) which are often pronounced K AY N D AX, L EH M IY, or L EH CH AX, respectively /01$ IEEE

2 328 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 4, MAY 2001 In this paper, we present a new approach to deriving compound words from a training corpus. Compound words have a fortiori longer phone sequences than their constituents, consequently, one would expect them to be misrecognized less frequently. Furthermore, they also enable more accurate modeling of pronunciation variability at the boundary between adjacent words in a continuously spoken utterance. We suggest and experiment with a number of acoustic and linguistic measures to select these compound words, and present results that indicate that up to a 7% relative improvement can be obtained by adding a small number of compound words to the vocabulary. The rest of the paper is organized as follows. In Section II, we investigate the effect of adding compound words to the language model and describe the various measures that we used for deriving compound words. In Section III, we discuss the experiments and results. Concluding remarks will be presented at the end of the paper. II. MEASURES FOR DERIVING COMPOUND WORDS Though the motivation for adding compound words to the vocabulary is clear, as mentioned previously, adding more tokens or pronunciation variants to the acoustic vocabulary and/or the language model could increase the confusability between words. Hence, the candidate pairs for compound words have to be chosen carefully in order to avoid this increase. Intuitively, such a pair has to meet several requirements [9]. 1) The pair of words has to occur frequently in the training corpus. There is no gain in adding a pair with a low count to the vocabulary since the chances of encountering that pair during the decoding of unseen data will be low. Besides, the compound word issued from this pair will contribute to the acoustic confusability with other words which are more likely according to the language model. 2) The words within the pair have to occur frequently together and more rarely in the pair context of other words. This requirement is necessary since one very frequent word, say, can be part of several different frequent pairs, say,. If all these pairs were to be added to the vocabulary then the confusability between and the pair or would be increased, especially if word has a short phone sequence. This will result in insertions or deletions of the word when incorrectly decoding the word or the sequence (or ). A concrete example is given by the function word THE which can occur in numerous different contexts (such as IN-THE, OF-THE, ON- THE, AT-THE, etc) all of which being frequent. 3) The words should ideally present coarticulation effects at the juncture, i.e., their continuous pronunciation should be different than when they are uttered in isolation. Unfortunately, this requirement is not always compatible with the previous ones, in other words, the word pairs which have strong coarticulation effects do not necessarily occur very often, nor do the individual words occur only together. Consider, for instance, the sequence BYE-BYE often pronounced B AX B AY which is relatively rare in our database whereas the individual word BYE appears in most voic messages. The use of compound words has been suggested by several researchers and has been shown to improve speech recognition performance for various tasks [1] [3], [6], [8], [10] [12]. We will make further references to the different approaches throughout this paper as we will examine some possible metrics for selecting compound words. These measures can be broadly classified into language model oriented or acoustic oriented measures, depending on whether the information that is being used is entirely textual or includes acoustic confusability such as phone recognition rate or coarticulated versus non coarticulated baseform (or word pronunciation variant or lexeme) recognition rate. A. Effect of Compound Words on the Language Model Before describing the methods related to selecting the compound words, it is instructive to see what effect the addition of these words has on the language model. Let us assume that the lexicon has been constructed, with the compound words selected according to some measure, and examine the effect on the language model. Language models are generally characterized by the log likelihood of the training data and the perplexity. The log likelihood of a sequence of words (representing the training data) can be obtained simply as The usual n-gram assumption limits the number of terms in the conditioning in (1) to. Hence, the log likelihood of the training data assuming a unigram or bigram model would be, respectively, We can also define an average log likelihood per word as, This may also be written as (3) Hence, the average log likelihood per word is related to the conditional entropy of given. The perplexity of the language model is defined in terms of the inverse of the average log likelihood per word [7]. It is an indication of the average number of words that can follow a given word (a measure of the predictive power of the language model). Hence (1) (2) Perplexity (4) 1) Unigram Model Difference in Log Likelihood: Consider the probability of a sequence of two words and. Further assume that and. The probability of this word sequence assuming a unigram language model is given by (5)

3 SAON AND PADMANABHAN: DATA-DRIVEN APPROACH TO DESIGNING COMPOUND WORDS 329 Now consider replacing the pair of words and in the original lexicon with the compound word. The likelihood of the word sequence becomes (6) Comparing (5) and (6), the difference in log probability is given by i.e., the compound word has the effect of incorporating a trigram dependency in a bigram language model. The denominator in (12) is the product of the forward and reverse bigram probability of and, and the numerator is the product of the forward and reverse trigram probability of and. B. Language Model Measures The first measure that we consider is the mutual information between two consecutive words [3], [6], [11], [12] which is defined as This can be seen to represent the mutual information between the words and, and forms the basis of the first linguistic measure. A similar discussion of the link between the likelihood and the average mutual information between adjacent classes is provided by Brown et al. in [2]. Bigram model-difference in log likelihood: An analogous reasoning can be applied in the case of a bigram language model by considering the probability of a sequence of three words and conditioned on when and. The probability of this word sequence assuming a bigram language model is (8) As before, replacing the pair of words and in the original lexicon with the compound word changes the likelihood of the word sequence as follows: (7) (13) From (7), this choice of compound words may be seen to be motivated by the desire to maximize the difference in log likelihood of the training data for the two lexicons when a unigram model is used. A weighted variant of the mutual information was proposed in [2] as a criterion for finding sticky pairs. Most authors however, use in its unweighted form, that is, they choose the pairs such as to maximize the mutual information between the words regardless of the frequency of the pairs (see, for example, [6] and [12]). In [11], is used only to select candidate pairs, the final decision of turning pairs into compound words is made based on bigram perplexity reduction. The second measure that we propose is based on defining a direct bigram probability between the words and as and a reverse bigram probability between the words as. The reverse bigram probability as a standalone measure has been mentioned in [10] (called backward bigram) and in [1] (called left probability). Both the direct and the reverse bigrams can be simply estimated from the training corpus as follows: Comparing (8) and (9), the difference in log likelihood is (9) (14) The measure that we used is the geometrical average of the direct and the reverse bigram Substituting (10) This measure has also been independently introduced in [1] (called mutual probability) and is similar to the correlation coefficient proposed recently by Kuo [8] which can be written as we get (11) (12) The similarity between the two arises from the fact that they divide the joint probability by the mean of the marginals,,, with the main difference lying in the choice of an arithmetic versus a geometric mean of the marginals. Note that for every pair of words. A high value for means that both the

4 330 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 4, MAY 2001 direct and the reverse bigrams are high for, or in other words, the probabilities that is followed by and is preceded by are high which makes the pair a good candidate for a compound word according to our second requirement. In our implementation we selected all pairs of words for which this measure is greater than a fixed threshold and for which the raw count of the word pair exceeds another predefined threshold. It may be seen that the mutual information measure has much in common with the bigram product measure. Intuitively, a high mutual information between two words means that they occur often together in the training corpus (the pair count is comparable with the individual counts), and in this sense it is similar to the bigram product measure. However, the bigram product measure imposes an additional constraint in that it not only requires and to occur together, but also prevents them from occurring in conjunction with other words. Further, from (12), it is not apparent that the log likelihood improves with the use of compound words chosen by because this measure maximizes the denominator term of the likelihood difference. The log likelihood is generally directly related to the perplexity, however, perplexities cannot be compared for language models with different vocabularies. Some authors suggest the use of a normalized perplexity (where the average log likelihood of the training data is computed with respect to the original number of words [1], [11]) and even design the compound words such as to directly optimize this quantity [8], [10], [11]. This turns out to be equivalent to increasing the total likelihood of the training corpus. C. Acoustic Measures Neither the bigram product measure nor the mutual information take into account coarticulation effects at word boundaries, since they are language model oriented measures. These coarticulation effects have to be added explicitly for the pairs which become compound words according to these metrics, either by using phonological rewriting rules or by manually designing coarticulated baseforms where appropriate. The second part of our study is centered around the use of explicit acoustic information when designing compound words. The first measure deals explicitly with coarticulation phenomena and can be summarized as follows. For the pairs of words in the training corpus which present such phenomena according to the applicability of at least one phonological rewriting rule [5], one can compare the number of times that a coarticulated baseform for the pair is preferred over a concatenation of non-coarticulated individual baseforms of the words forming that pair in the training corpus. This can be estimated by doing a Viterbi alignment of all instances of the word pair in the training data, with the coarticulated pair baseform and with the concatenation of individual baseforms, and selecting the baseform which has a higher acoustic score. If baseform denotes the number of times that the coarticulated baseform is preferred, and baseform denotes the number of times that the concatenated baseform is preferred, the measure is defined as the ratio between these two counts baseform baseform If this ratio is bigger than a threshold (which is set in practice to 1) then the pair is turned into a compound word. The second measure is more related to the acoustic confusability of a word. Let us assume that word has a low probability of correct classification. One would expect that, by tying to a word which has a higher phone classification accuracy, the compound word (or ) would have a higher classification accuracy. The second measure that we define computes a quantity related to the probability of correct classification for the different pronunciation variants of the compound word. We first define a probability of correct classification for word,, as follows: phone where phone denotes the probability of correct classification for phone (this is computed by decoding the training data and just counting the number of times that phone was correctly recognized), and baseform denotes the number of phones in the baseform. The second acoustic measure for selecting compound words is defined as follows: The word pairs that maximize this measure are then selected as compound words. III. EXPERIMENTS AND RESULTS All the experiments were performed on a telephony voic database comprising about 40 h of speech [9]. The language model is a conventional linearly interpolated trigram model [7] and was trained on approximately 400 K words of text. The effect of adding compound words was to increase the span of the LM beyond trigrams. We have not attempted, however, to compare a weaker LM (say a bigram LM) augmented with compound words with the corresponding trigram or -gram LM as was suggested by one reviewer. The size of the acoustic vocabulary for the application is 14 K words. The results are reported on a set of 43 voic messages (roughly 2000 words). The experimental setup is as follows. We started with a vocabulary that had no compound words, and applied every measure iteratively to increase the number of compound words in the vocabulary. After one iteration, the word pairs that scored more than a threshold were transformed into compound words and all instances of the pairs in the training corpus were replaced by these new words. Both the acoustic vocabulary and the language model vocabulary were augmented by these words after each step. In the following tables, underlined compound words are meant to indicate that the compound words also had coarticulated baseforms which were added to the acoustic vocabulary. Also, indicates the number of compound words that were added to the vocabulary during the current iteration. We will first describe the results with as this gave us the best performance.

5 SAON AND PADMANABHAN: DATA-DRIVEN APPROACH TO DESIGNING COMPOUND WORDS 331 TABLE I RECOGNITION SCORES AND PERPLEXITIES FOR MEASURE LM TABLE II RECOGNITION SCORES AND PERPLEXITIES FOR MEASURES LM, AC AND AC For the second language model measure, which was based on the product between the direct and the reverse bigram, the threshold was chosen to be 0.2, i.e., if, then would be made a compound word. This threshold was chosen so as to get approximately the same number of compound words finally as in the case where they were designed by hand. Table I summarizes the number of new compound words obtained after each iteration, examples of such words, and the word error rate as well as the perplexity of the test set (the normalized perplexity is denoted by a *). The last line of Table I also indicates the beneficial effect of adding coarticulated baseforms to the vocabulary, even when the compound words are chosen strictly based on a linguistic measure. The only difference between Iteration 3 and 3b in Table I is that in the former case, baseforms were added to the vocabulary to account for the coarticulation in the selected compound words, whereas in the latter case (3b), the baseforms were simply a concatenation of the baseforms of the individual components. This seems to indicate that though a significant gain can be obtained by selecting compound words based only on a linguistic measure, the gain can be further enhanced by allowing for a coarticulated pronunciation of these selected compound words. For the remaining measures (, and ), the thresholds were set such as to obtain the same number of words (or pairs) after each iteration as for the case. We believe that this facilitates a fair comparison between the performances of the different measures. The threshold on the pair count was set to 100 for and (or ) and to 300 for. The performances of these measures are illustrated in Table I. It may be seen that there is virtually no improvement by using any of these other measures. The bigram product measure, outperforms the mutual information metric, because the measure seems to pick words which co-occur frequently (i.e., the first condition in Section II) without paying heed to whether the same constituent words also co-occur frequently with other words (the second condition in Section II). Another observation from Tables I and Table II is that for the same number of pairs after the first iteration (42), the difference in perplexity is significant between the language models based on and. Surprisingly, the better performance is obtained for the language model with a higher perplexity. 1 The poor performance of the acoustic measures can be explained by the fact that neither nor take into account word pair frequency information. Besides, there is no measure of the degree of stickiness of a pair as in the case of the lan- 1 As was pointed out by the reviewers, perplexity cannot really be compared across different vocabularies. The normalized perplexity (also shown in the tables) is supposedly a better indicator of task complexity in this case, but our results did not seem to indicate any great correlation between the word error rate and normalized perplexity either.

6 332 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 4, MAY 2001 TABLE III PERPLEXITY AND RECOGNITION PERFORMANCE USING MANUALLY DESIGNED COMPOUND WORDS frequency of pairs and the degree of closeness of a pair (how often do the words of a pair occur together). Once the pairs have been found, the modeling of coarticulation effects at word boundaries within the pairs (where applicable) may further improve the overall performance. guage model oriented measures (by stickiness, we mean frequency of co-occurrence of the word pair, i.e., word tends to stick to word ). This tends to increase the acoustic confusability between words in the vocabulary since a frequent word can be part of many pairs now. Finally, Table III shows the performance of a set of 58 manually designed compound words suited for the voic recognition task. It is generally the case that tuning the speech recognition system to a particular task (for instance by manually selecting the compound words) is a process that does tend to improve performance on the task, however, this represents a tedious and time consuming process. Consequently, it is encouraging to see that the statistically derived measure (which can be implemented relatively easily on a new task) is able to approach the same performance, even though it uses a few more compound words. IV. DISCUSSION In this paper, we experimented with a number of methods to design compound words to augment the vocabulary of a speech recognition system. The motivation for combining pairs of words to form compound words is twofold: 1) experimental observations indicate that it is less likely that longer phone sequences are misrecognized and 2) compound words enable cross word coarticulation effects to be easily modeled. We experimented with both linguistic and acoustic measures in selecting these compound words. The linguistic measures were related to the mutual information between word pairs, and a new measure, the product of the forward and reverse bigram probability of the word pair. The acoustic measures were based on whether the word pair had a significant amount of cross word coarticulation. Our experimental results indicated that the second linguistic measure was particularly useful in selecting compound words. Even though we found that selecting compound words on the basis of acoustic measures was not useful, we found that in the case where the compound words were selected based on the linguistic measure, it was beneficial to add coarticulated baseforms when necessary for the selected compound words. Experimental results show an overall improvement in word error rate of 7% (relative) and achieves comparable performance to human design of compound words. The main conclusion that can be drawn is that effective metrics for designing compound words should depend upon some language model information such as the REFERENCES [1] C. Beaujard and M. Jardino, Language modeling based on automatic word concatenations, in Proc. Eurospeech 99, Budapest, Hungary, [2] P. F. Brown, V. J. Della Pietra, P. V. DeSouza, J. C. Lai, and R. L. Mercer, Class-based n-gram models of natural language, Comput. Linguist., vol. 18, no. 4, pp , [3] M. Finke and A. Waibel, Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition, in Proc. Eurospeech 97, Rhodes, Greece, [4] M. Finke, Flexible transcription alignment, in 1997 IEEE Workshop Speech Recognition Understanding, Santa Barbara, CA, [5] E. P. Giachin, A. E. Rosenberg, and C. H. Lee, Word juncture modeling using phonological rules for HMM-based continuous speech recognition, Comput., Speech, Lang., vol. 5, pp , [6] E. P. Giachin, Phrase bigrams for continuous speech recognition, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, Detroit, MI, 1995, pp [7] F. Jelinek, Statistical methods for speech recognition, in Language, Speech and Communication Series. Cambridge, MA: MIT Press, [8] H. K. J. Kuo and W. Reichl, Phrase-based language models for speech recognition, in Proc. Eurospeech 99, Budapest, Hungary, [9] M. Padmanabhan, G. Saon, S. Basu, J. Huang, and G. Zweig, Recent improvements in voic transcription, in Proc. Eurospeech 99, Budapest, Hungary, [10] K. Ries, F. D. Buo, and A. Waibel, Class phrase models for language modeling, in Proc. Int. Conf. Speech Language Processing 96, Philadelphia, PA, [11] B. Suhm and A. Waibel, Toward better language models for spontaneous speech, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing 94, Yokohama, Japan, [12] I. Zitouni, J. F. Mari, K. Smaili, and J. P. Haton, Variable-length sequence language models for large vocabulary continuous dictation machine, in Proc. Eurospeech 99, Budapest, Hungary, George Saon received the M.Sc. and Ph.D. degrees in computer science from the University Henri Poincare, Nancy, France, in 1994 and From 1994 to 1998, he worked on stochastic modeling for off-line handwriting recognition at the Laboratorie Lorrain de Recherche en Informatique et ses Applications (LORIA). He is currently with the IBM T. J. Watson Research Center, Yorktown Heights, NY, conducting research on large vocabulary conversational telephone speech recognition. His research interests are in pattern recognition and stochastic modeling. Mukund Padmanabhan (S 89 M 89 SM 99) received the M.S. and Ph.D. degrees from the University of California, Los Angeles, in 1989 and 1992, respectively. Since 1992, he has been with the Speech Recognition Group, IBM T. J. Watson Research Center, Yorktown Heights, NY, where he currently manages a group conducting research on aspects of telephone speech recognition. His research interests are in speech recognition and language processing algorithms, signal processing algorithms, and analog integrated circuits. He is coauthor of the book Feedback-based Orthogonal Digital Filters: Theory, Applications, and Implementation.

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Mandarin Lexical Tone Recognition: The Gating Paradigm

Mandarin Lexical Tone Recognition: The Gating Paradigm Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition

More information

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,

More information

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH Don McAllaster, Larry Gillick, Francesco Scattone, Mike Newman Dragon Systems, Inc. 320 Nevada Street Newton, MA 02160

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One

More information

Investigation on Mandarin Broadcast News Speech Recognition

Investigation on Mandarin Broadcast News Speech Recognition Investigation on Mandarin Broadcast News Speech Recognition Mei-Yuh Hwang 1, Xin Lei 1, Wen Wang 2, Takahiro Shinozaki 1 1 Univ. of Washington, Dept. of Electrical Engineering, Seattle, WA 98195 USA 2

More information

Multi-Lingual Text Leveling

Multi-Lingual Text Leveling Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency

More information

WHEN THERE IS A mismatch between the acoustic

WHEN THERE IS A mismatch between the acoustic 808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

CEFR Overall Illustrative English Proficiency Scales

CEFR Overall Illustrative English Proficiency Scales CEFR Overall Illustrative English Proficiency s CEFR CEFR OVERALL ORAL PRODUCTION Has a good command of idiomatic expressions and colloquialisms with awareness of connotative levels of meaning. Can convey

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February

More information

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,

More information

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Hua Zhang, Yun Tang, Wenju Liu and Bo Xu National Laboratory of Pattern Recognition Institute of Automation, Chinese

More information

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS ELIZABETH ANNE SOMERS Spring 2011 A thesis submitted in partial

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION Han Shu, I. Lee Hetherington, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge,

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

How to Judge the Quality of an Objective Classroom Test

How to Judge the Quality of an Objective Classroom Test How to Judge the Quality of an Objective Classroom Test Technical Bulletin #6 Evaluation and Examination Service The University of Iowa (319) 335-0356 HOW TO JUDGE THE QUALITY OF AN OBJECTIVE CLASSROOM

More information

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

More information

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad

More information

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING Gábor Gosztolya 1, Tamás Grósz 1, László Tóth 1, David Imseng 2 1 MTA-SZTE Research Group on Artificial

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Detecting English-French Cognates Using Orthographic Edit Distance

Detecting English-French Cognates Using Orthographic Edit Distance Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National

More information

WHY SOLVE PROBLEMS? INTERVIEWING COLLEGE FACULTY ABOUT THE LEARNING AND TEACHING OF PROBLEM SOLVING

WHY SOLVE PROBLEMS? INTERVIEWING COLLEGE FACULTY ABOUT THE LEARNING AND TEACHING OF PROBLEM SOLVING From Proceedings of Physics Teacher Education Beyond 2000 International Conference, Barcelona, Spain, August 27 to September 1, 2000 WHY SOLVE PROBLEMS? INTERVIEWING COLLEGE FACULTY ABOUT THE LEARNING

More information

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.

More information

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer

More information

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction CLASSIFICATION OF PROGRAM Critical Elements Analysis 1 Program Name: Macmillan/McGraw Hill Reading 2003 Date of Publication: 2003 Publisher: Macmillan/McGraw Hill Reviewer Code: 1. X The program meets

More information

An Online Handwriting Recognition System For Turkish

An Online Handwriting Recognition System For Turkish An Online Handwriting Recognition System For Turkish Esra Vural, Hakan Erdogan, Kemal Oflazer, Berrin Yanikoglu Sabanci University, Tuzla, Istanbul, Turkey 34956 ABSTRACT Despite recent developments in

More information

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Alistair Conkie AT&T abs - Research 180 Park Avenue, Florham Park,

More information

Corrective Feedback and Persistent Learning for Information Extraction

Corrective Feedback and Persistent Learning for Information Extraction Corrective Feedback and Persistent Learning for Information Extraction Aron Culotta a, Trausti Kristjansson b, Andrew McCallum a, Paul Viola c a Dept. of Computer Science, University of Massachusetts,

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Speech Translation for Triage of Emergency Phonecalls in Minority Languages

Speech Translation for Triage of Emergency Phonecalls in Minority Languages Speech Translation for Triage of Emergency Phonecalls in Minority Languages Udhyakumar Nallasamy, Alan W Black, Tanja Schultz, Robert Frederking Language Technologies Institute Carnegie Mellon University

More information

A study of speaker adaptation for DNN-based speech synthesis

A study of speaker adaptation for DNN-based speech synthesis A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition Yanzhang He, Eric Fosler-Lussier Department of Computer Science and Engineering The hio

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

On the Formation of Phoneme Categories in DNN Acoustic Models

On the Formation of Phoneme Categories in DNN Acoustic Models On the Formation of Phoneme Categories in DNN Acoustic Models Tasha Nagamine Department of Electrical Engineering, Columbia University T. Nagamine Motivation Large performance gap between humans and state-

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Disambiguation of Thai Personal Name from Online News Articles

Disambiguation of Thai Personal Name from Online News Articles Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

DOES RETELLING TECHNIQUE IMPROVE SPEAKING FLUENCY?

DOES RETELLING TECHNIQUE IMPROVE SPEAKING FLUENCY? DOES RETELLING TECHNIQUE IMPROVE SPEAKING FLUENCY? Noor Rachmawaty (itaw75123@yahoo.com) Istanti Hermagustiana (dulcemaria_81@yahoo.com) Universitas Mulawarman, Indonesia Abstract: This paper is based

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Neural Network GUI Tested on Text-To-Phoneme Mapping A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282) B. PALTRIDGE, DISCOURSE ANALYSIS: AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC. 2012. PP. VI, 282) Review by Glenda Shopen _ This book is a revised edition of the author s 2006 introductory

More information

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 Teresa Herrmann, Mohammed Mediani, Jan Niehues and Alex Waibel Karlsruhe Institute of Technology Karlsruhe, Germany firstname.lastname@kit.edu

More information

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

Greedy Decoding for Statistical Machine Translation in Almost Linear Time in: Proceedings of HLT-NAACL 23. Edmonton, Canada, May 27 June 1, 23. This version was produced on April 2, 23. Greedy Decoding for Statistical Machine Translation in Almost Linear Time Ulrich Germann

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Characterizing and Processing Robot-Directed Speech

Characterizing and Processing Robot-Directed Speech Characterizing and Processing Robot-Directed Speech Paulina Varchavskaia, Paul Fitzpatrick, Cynthia Breazeal AI Lab, MIT, Cambridge, USA [paulina,paulfitz,cynthia]@ai.mit.edu Abstract. Speech directed

More information

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick

More information

Improvements to the Pruning Behavior of DNN Acoustic Models

Improvements to the Pruning Behavior of DNN Acoustic Models Improvements to the Pruning Behavior of DNN Acoustic Models Matthias Paulik Apple Inc., Infinite Loop, Cupertino, CA 954 mpaulik@apple.com Abstract This paper examines two strategies that positively influence

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report Contact Information All correspondence and mailings should be addressed to: CaMLA

More information

South Carolina English Language Arts

South Carolina English Language Arts South Carolina English Language Arts A S O F J U N E 2 0, 2 0 1 0, T H I S S TAT E H A D A D O P T E D T H E CO M M O N CO R E S TAT E S TA N DA R D S. DOCUMENTS REVIEWED South Carolina Academic Content

More information

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology ISCA Archive SUBJECTIVE EVALUATION FOR HMM-BASED SPEECH-TO-LIP MOVEMENT SYNTHESIS Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano Graduate School of Information Science, Nara Institute of Science & Technology

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Jana Kitzmann and Dirk Schiereck, Endowed Chair for Banking and Finance, EUROPEAN BUSINESS SCHOOL, International

More information

Rendezvous with Comet Halley Next Generation of Science Standards

Rendezvous with Comet Halley Next Generation of Science Standards Next Generation of Science Standards 5th Grade 6 th Grade 7 th Grade 8 th Grade 5-PS1-3 Make observations and measurements to identify materials based on their properties. MS-PS1-4 Develop a model that

More information

Strategies for Solving Fraction Tasks and Their Link to Algebraic Thinking

Strategies for Solving Fraction Tasks and Their Link to Algebraic Thinking Strategies for Solving Fraction Tasks and Their Link to Algebraic Thinking Catherine Pearn The University of Melbourne Max Stephens The University of Melbourne

More information

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 1 CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 Peter A. Chew, Brett W. Bader, Ahmed Abdelali Proceedings of the 13 th SIGKDD, 2007 Tiago Luís Outline 2 Cross-Language IR (CLIR) Latent Semantic Analysis

More information

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders

More information

A Reinforcement Learning Variant for Control Scheduling

A Reinforcement Learning Variant for Control Scheduling A Reinforcement Learning Variant for Control Scheduling Aloke Guha Honeywell Sensor and System Development Center 3660 Technology Drive Minneapolis MN 55417 Abstract We present an algorithm based on reinforcement

More information

Deep Neural Network Language Models

Deep Neural Network Language Models Deep Neural Network Language Models Ebru Arısoy, Tara N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran IBM T.J. Watson Research Center Yorktown Heights, NY, 10598, USA {earisoy, tsainath, bedk, bhuvana}@us.ibm.com

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

cmp-lg/ Jan 1998

cmp-lg/ Jan 1998 Identifying Discourse Markers in Spoken Dialog Peter A. Heeman and Donna Byron and James F. Allen Computer Science and Engineering Department of Computer Science Oregon Graduate Institute University of

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

Pedagogical Content Knowledge for Teaching Primary Mathematics: A Case Study of Two Teachers

Pedagogical Content Knowledge for Teaching Primary Mathematics: A Case Study of Two Teachers Pedagogical Content Knowledge for Teaching Primary Mathematics: A Case Study of Two Teachers Monica Baker University of Melbourne mbaker@huntingtower.vic.edu.au Helen Chick University of Melbourne h.chick@unimelb.edu.au

More information

Voice conversion through vector quantization

Voice conversion through vector quantization J. Acoust. Soc. Jpn.(E)11, 2 (1990) Voice conversion through vector quantization Masanobu Abe, Satoshi Nakamura, Kiyohiro Shikano, and Hisao Kuwabara A TR Interpreting Telephony Research Laboratories,

More information

Deploying Agile Practices in Organizations: A Case Study

Deploying Agile Practices in Organizations: A Case Study Copyright: EuroSPI 2005, Will be presented at 9-11 November, Budapest, Hungary Deploying Agile Practices in Organizations: A Case Study Minna Pikkarainen 1, Outi Salo 1, and Jari Still 2 1 VTT Technical

More information

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com

More information

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH 2009 423 Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition George

More information

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling Pratyush Banerjee, Sudip Kumar Naskar, Johann Roturier 1, Andy Way 2, Josef van Genabith

More information

5. UPPER INTERMEDIATE

5. UPPER INTERMEDIATE Triolearn General Programmes adapt the standards and the Qualifications of Common European Framework of Reference (CEFR) and Cambridge ESOL. It is designed to be compatible to the local and the regional

More information

Large vocabulary off-line handwriting recognition: A survey

Large vocabulary off-line handwriting recognition: A survey Pattern Anal Applic (2003) 6: 97 121 DOI 10.1007/s10044-002-0169-3 ORIGINAL ARTICLE A. L. Koerich, R. Sabourin, C. Y. Suen Large vocabulary off-line handwriting recognition: A survey Received: 24/09/01

More information

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016 AGENDA Advanced Learning Theories Alejandra J. Magana, Ph.D. admagana@purdue.edu Introduction to Learning Theories Role of Learning Theories and Frameworks Learning Design Research Design Dual Coding Theory

More information

Reinforcement Learning by Comparing Immediate Reward

Reinforcement Learning by Comparing Immediate Reward Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate

More information

Florida Reading Endorsement Alignment Matrix Competency 1

Florida Reading Endorsement Alignment Matrix Competency 1 Florida Reading Endorsement Alignment Matrix Competency 1 Reading Endorsement Guiding Principle: Teachers will understand and teach reading as an ongoing strategic process resulting in students comprehending

More information

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION Mitchell McLaren 1, Yun Lei 1, Luciana Ferrer 2 1 Speech Technology and Research Laboratory, SRI International, California, USA 2 Departamento

More information

COPING WITH LANGUAGE DATA SPARSITY: SEMANTIC HEAD MAPPING OF COMPOUND WORDS

COPING WITH LANGUAGE DATA SPARSITY: SEMANTIC HEAD MAPPING OF COMPOUND WORDS COPING WITH LANGUAGE DATA SPARSITY: SEMANTIC HEAD MAPPING OF COMPOUND WORDS Joris Pelemans 1, Kris Demuynck 2, Hugo Van hamme 1, Patrick Wambacq 1 1 Dept. ESAT, Katholieke Universiteit Leuven, Belgium

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature 1 st Grade Curriculum Map Common Core Standards Language Arts 2013 2014 1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature Key Ideas and Details

More information

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models Jianfeng Gao Microsoft Research One Microsoft Way Redmond, WA 98052 USA jfgao@microsoft.com Xiaodong He Microsoft

More information

Quantitative Evaluation of an Intuitive Teaching Method for Industrial Robot Using a Force / Moment Direction Sensor

Quantitative Evaluation of an Intuitive Teaching Method for Industrial Robot Using a Force / Moment Direction Sensor International Journal of Control, Automation, and Systems Vol. 1, No. 3, September 2003 395 Quantitative Evaluation of an Intuitive Teaching Method for Industrial Robot Using a Force / Moment Direction

More information