Automatic pronunciation error detection. in Dutch as a second language: an acoustic-phonetic approach

Size: px
Start display at page:

Download "Automatic pronunciation error detection. in Dutch as a second language: an acoustic-phonetic approach"

Transcription

1 MA Thesis Automatic pronunciation error detection in Dutch as a second language: an acoustic-phonetic approach Khiet Truong First Supervisor: Helmer Strik (University of Nijmegen) Second Supervisor: Gerrit Bloothooft (Utrecht University) Submitted to the Faculty of Arts of Utrecht University The Netherlands

2 Doctoraalscriptie van Khiet Truong (Titel vertaald: Automatische detectie van uitspraakfouten bij NT2-leerders: een akoestisch-fonetische aanpak) Faculteit der Letteren, Universiteit van Utrecht Opleiding: Algemene Taalwetenschap Specialisatie: Computertaalkunde Eerste scriptiebegeleider: Helmer Strik (Katholieke Universiteit van Nijmegen) Tweede scriptiebegeleider: Gerrit Bloothooft (Universiteit van Utrecht) Juni 2004

3 Acknowledgements The research for my MA thesis was carried out at the department of Language and Speech at the University of Nijmegen. From September June 2004, I participated in the PROO project. I would like to take this opportunity to thank everyone who has helped me doing research and writing my MA thesis at this department. I would like to thank Lou Boves and Gerrit Bloothooft for making this traineeship possible. I would also like to thank my supervisors who have guided me and helped me completing this thesis: Helmer Strik, Catia Cucchiarini, Ambra Neri (University of Nijmegen) and Gerrit Bloothooft (Utrecht University). Thank you, I have learned so much from you. The other members of the PROO group are also thanked for their help. Finally, the members of the department of Language and Speech at the University of Nijmegen and the scriptie groep at Utrecht University are thanked for sharing their knowledge and giving feedback on my work and presentations. Khiet Truong Apeldoorn, June

4 Contents Acknowledgements 3 Contents 4 1 Introduction Background: CAPT within CALL The aim of the present study Structure of the thesis Automatic detection of pronunciation errors: a small literature study Introduction Why do L2 learners produce pronunciation errors? What kind of pronunciation errors do L2-learners make? Possible goals in pronunciation teaching Overview of automatic pronunciation error detection techniques in the literature Overview of ASR-based techniques Adding extra knowledge to acoustic models and ASR-based techniques Automatic pronunciation error detection techniques employed in real-life applications Employing ASR-based techniques in real-life CALL applications Using acoustic-phonetic information in real-life CALL applications The approach adopted in the present study Introduction An acoustic-phonetic approach to automatic pronunciation error detection

5 3.3 Selecting pronunciation errors Goal of pronunciation teaching adopted in this study The pronunciation errors addressed in this study Material & Method Introduction Material Algorithms used in this study Linear Discriminant Analysis Decision tree-based The pronunciation error detectors /A/-/a:/ and /Y/-/u,y/ Introduction Acoustic characteristics of /A/, /a:/, /Y/, /u/ and /y/ General acoustic characteristics of vowels Acoustic differences between /A/ and /a:/ Acoustic differences between /Y/ and /u,y/ Acoustic features for vowel classification: experiments in the literature Method & acoustic measurements Experiments and results for /A/-/a:/ and /Y/-/u,y/ Organization of experiments Experiments and results /A/-/a:/ Experiments and results /Y/-/u,y/ Experiments and results /Y/-/u/-/y/ Discussion of results Discussion of the results of /A/-/a:/ Discussion of the results of /Y/-/u,y/ The pronunciation error detector /x/-/k,g/ Introduction Acoustic characteristics of /x/, /k/ and /g/ General acoustic characteristics of consonants

6 6.2.2 Acoustic differences between /x/ and /k,g/ Acoustic features for fricatives versus plosives classification: experiments in the literature Methods & acoustic measurements Method I & acoustic measurements Method II & acoustic measurements Experiments and results for /x/-/k,g/ Organization of experiments Experiments and results method I Experiments and results method II Discussion of results Conclusions and summary of results Introduction Summary of /A/-/a:/ Summary of /Y/-/u,y/ Summary of /x/-/k,g/ Conclusions References 113 A List of abbreviations 117 B List of phonetic symbols 118 C Scripts 121 D Sentences 127 E Amount of speech data 129 F Tables with classification scores 133 G How to read Whisker s Boxplot 147 6

7 Chapter 1 Introduction 1.1 Background: CAPT within CALL Traditionally, pronunciation training received less attention than writing and grammar in foreign language teaching. Many language teachers believed that pronunciation did not deserve as much attention as other linguistic aspects such as grammar, mainly because they considered accent-free pronunciation a myth (Scovel, 1988), and thus an impossible goal to achieve. This view has influenced, among other factors, the amount of available information on how pronunciation can be best taught rather negatively. Nowadays, it is generally agreed that a reasonably intelligible pronunciation is more important than accent-free pronunciation. Unfortunately, training of pronunciation is still often neglected in traditional classroom instruction for the main reason that training pronunciation is time-consuming: training pronunciation requires a lot of time for practice from students and a lot of time from teachers for providing feedback. Computer Aided Language Learning systems (CALL) can offer a solution. More specifically, a Computer Aided Pronunciation Training module (CAPT) within such a CALL system can tackle problems that are associated with training pronunciation in a classroom environment, and offers many other advantages. Technology is nowadays more and more integrated in teaching and more specifically foreign language teaching. There are many software applications available on the market that teach users foreign languages. CAPT and CALL applications provide a solution to the problems mentioned above. First of all, computers are more patient than human teachers, and are usually available without any time constraints. And secondly, computers provide a more individual way of learning 7

8 which allows students to practise their own pronunciation difficulties and work at their own pace, whereas in a traditional classroom environment, it is difficult to focus on the needs of individual students. Moreover, student profiles can be logged by the system, so the improvement or problems can be monitored by the teacher or the student him/herself. Finally, a classroom environment can cause more anxiety or stress for students; a CALL environment which offers more privacy can reduce this phenomenon known as foreign language anxiety (Young, 1990). These collective advantages have led to an increasing interest in CALL, and more specifically CAPT, by the language teaching community. Developing CALL and CAPT systems offers challenges and new interdisciplinary areas of interest in the field of language teaching: technology is to be integrated in a language teaching system in such a way that it needs to meet pedagogical requirements. Neri et al. (2002) describe the relationship between pedagogy and technology in CAPT courseware more closely. CAPT can be integrated into a CALL system by using Automatic Speech Recognition (ASR) technology. An (ideal) ASR-based CAPT system can be described by a sequence of three phases: 1) Speech Recognition : the first and most important phase because the subsequent phases depend on the accuracy of this one. In interactive dialogues with multiple-choice answers the correct answer should be recognized by the system and all other answers should be discarded. Furthermore, ASR timealignes the spoken signal with phone labels; phase 2) and 3) are based on this timealignment. 2) Scoring and error detection/diagnosis : the system evaluates the pronunciation quality and can give a global score. Pronunciation errors are located and the type of error is determined for phase 3). 3) Feedback : with the diagnosis of the pronunciation error, correct feedback can be given that meets the pedagogical requirements. Ideally, such a CAPT system should mimic the tasks of a human teacher and give the same judgements about the student s pronunciation as a human teacher would do. CALL and CAPT systems are therefore usually evaluated by how well judgements from the machine agree/correlate with human judgements (human-machine correlations) of the same speech material. 8

9 1.2 The aim of the present study The focus of this study is on automatic pronunciation error detection (phase 2 in the previous scheme) in speech of foreign language learners. In our case, the foreign language is Dutch which is learned by second language (L2) learners. In general, automatic pronunciation error detection techniques usually involve measures that are obtained by means of automatic speech recognition technology, generalized under the term confidence scores (see chapter 2.5) which in some way represent how certain the system is that signal X belongs to a pattern Y: a low confidence (score) of the system may indicate bad pronunciation quality. These measures have the advantage that they can be obtained fairly easily, and that they can be calculated in similar ways for all speech sounds. However, ASR confidence measures also have the disadvantage that they are not very accurate: the average human-machine correlations they yield are rather low, and, consequently, their predictive power of pronunciation quality is also rather low (see e.g. Kim et al., 1997). This lack of accuracy might be related to the fact that confidence scores are computed in the same way for all speech sounds, without focusing on the specific acoustic-phonetic features of individual sounds. These disadvantages of methods based on confidence measures have led to the present study in which we investigate an alternative approach that would yield higher detection accuracy. In this study, we present an acoustic-phonetic approach for detection of pronunciation errors at phone-level. The goal of this study is formulated as: Goal: to develop automatic acoustic-phonetic-based classification techniques for automatic pronunciation error detection in speech of L2 learners of Dutch. Related to this goal, is the question of how well these automatic classification techniques perform in detecting pronunciation errors at phone level. This is the main question addressed in this study and is formulated as the thesis question: Thesis question: How effective are automatic acoustic-phonetic-based classification techniques in detecting pronunciation errors of L2 learners of Dutch? In this context, effective means that ideally, the techniques should be able to detect pronunciation errors just as humans do: machine judgements should resemble human judgements. For this purpose, a non-native speech database of Dutch was annotated by human listeners on pronunciation errors. The non-native speech used in this study was checked and annotated on pronunciation 9

10 errors so that these human annotations (judgements) could be compared to machine judgements. The acoustic-phonetic approach (section 3) enables us to be more specific in developing pronunciation error techniques. First, we selected three pronunciation errors by carrying out a survey on an annotated non-native speech database. We found that the following three speech sounds were often mispronounced by non-native speakers and decided to address these three pronunciation errors in this study: /A/ mispronounced as /a:/ /Y/ mispronounced as /u/ or /y/ /x/ mispronounced as /k/ or /g/ For each pronunciation error, the acoustic differences between a correctly pronounced phone and an incorrectly pronounced phone are examined and these acoustic differences, translated into acousticphonetic features, are used to develop a pronunciation error detector. Classification experiments and statistical analyses can show which specific features are most reliable for detection of a particular pronunciation error (Q1). Another interesting issue that is to be examined in this study, is the use of native or non-native speech as training material: it is still not clear whether a detector should be trained on native or non-native speech material to achieve the highest detection accuracy, without degrading the performance for native speakers (Q2). For the pronunciation error of /x/, we will examine two different methods: one that uses Lineair Discriminant Analysis and one that uses a decision tree to classify a sound as either correct or incorrect. Is there a preference of one method over the other (Q3)? Thus, in addition to the main question, three other questions are posed that are related to this acoustic-phonetic approach and the thesis question: Q1. What are reliable discriminative acoustic-phonetic features of phonemes for pronunciation errors of /A/, /Y/ and /x/? Q2. How do the detectors trained under different conditions (trained on native or non-native speech), cope with non-native speech? Q3. What are the advantages of a Lineair Discrimination Analysis method (LDA) as opposed to a decision tree-based method for automatic pronunciation error detection? 10

11 The following chapters describe how the goal of this study is achieved and how we try to find the answers to the questions posed above. 1.3 Structure of the thesis Chapter 2 reports on a small literature study on automatic pronunciation error detection. First, we examine why L2 learners produce pronunciation errors (section 2.2) and show some examples of these errors (section 2.3). A description of possible goals of pronunciation teaching is given in section 2.4. Finally, overviews of different kinds of automatic pronunciation error techniques are given in section 2.5 and 2.6. Chapter 3 describes the approach adopted in this study (section 3.2). Part of this approach is the selection procedure for the pronunciation errors addressed in this study (3.3). Chapter 4 gives an description of the material and different classification algorithms that were used in this study. The speech material is described in section 4.2 and two different classification algorithms are described in section 4.3. Chapter 5 reports on the development of the pronunciation error detectors for errors of /A/ and /Y/. First, the acoustic characteristics of the two sounds are examined (section 5.2) to determine potential discriminative features. A description of the procedure for acoustic feature extraction is given in section 5.3 and in section 5.4, the results of the classification experiments are shown. Finally, a discussion of the results is given in section 5.5. Chapter 6 reports on the development of the pronunciation error detectors for errors of /x/. The acoustic properties of this pronunciation error are examined in section 6.2. Two classification methods for this error are introduced in section 6.3. Finally, in section 6.4 the results of the classification experiments are shown, and discussed in section 6.5. Chapter 7 gives a summary of the results. Summaries of the classification experiments of /A/ vs /a:/, /Y/ vs /u,y/ and /x/ vs /k,g/ are given in section 7.2, 7.3 and 7.4 respectively. Finally, in section 7.5, we try to answer the questions posed at the beginning of this thesis and give some suggestions for further research. Some practical remarks: Throughout this work, phonetic symbols will be used in SAMPA notation: for a list of phonetic 11

12 symbols in IPA and SAMPA notation, see appendix B. A list of abbrevations used throughout this work is given in appendix A. 12

13 Chapter 2 Automatic detection of pronunciation errors: a small literature study 2.1 Introduction Chapter 2 reports on a small literature study and consists of two parts: before we give an overview of automatic pronunciation error detection techniques (section 2.5 and 2.6), we give some background information on pronunciation errors. Why and what kind of pronunciation errors are produced by L2 learners is described in section 2.2 and 2.3. Section 2.4 describes some possible goals in pronunciation teaching. The second part of this chapter gives an overview of automatic pronunciation error detection techniques (section 2.5 and 2.6). 2.2 Why do L2 learners produce pronunciation errors? Pronunciation errors exist because L2 sounds are not correctly produced by the L2 learner. How then are L2 sounds learned by L2 learners or to say it differently: why are L2 sounds not properly learned? Various studies that have investigated this issue, have also paid attention to the relationship between production and perception of L2 sounds. The main question seems to be whether production precedes perception, or perception precedes production in the process of acquiring an L2 (Llisterri, 1995). Or in other words, is an L2 learner able to produce an L2 sound accurately if the same sound is not correctly perceived? This relationship between production and perception 13

14 of L2 sounds is related to factors such as age of learning and knowledge of L2. Some researchers have proposed that neurological maturation might lead to a diminished ability to add or modify sensorimotor programs that configure the movements of the articulators for producing sounds in an L2 (e.g. McLaughlin, 1977). Many researchers believe that when a certain age is passed, new sounds in speech cannot be learned perfectly. For instance, it was found that the later non-native speakers began learning English, the more strongly foreign-accented their English sentences were judged to be (Flege, Munro and MacKay, 1995). The existence of this so-called critical period is often explained by neurological maturation. Knowledge of L2 may also affect the relationship between production and perception. Bohn & Flege (1990) investigated this factor by examining the production and perception of the English vowels /e:/ and /{/ (IPA /æ/) in two groups of German learners of English: an experienced group and an inexperienced group. The results showed that there are clear differences between the two groups of speakers: the inexperienced group did not produce the contrast between the two vowels, but was able to differentiate them in a labeling task and thus was able to perceive them correctly; the experienced group did produce the contrast and was better in the labeling task. Furthermore, they found that the two groups relied on different acoustic cues in the labeling task. They concluded that perception may lead production in the early stages of L2 speech learning and that production might be improved by experience. Evidence supporting the view production precedes perception can be found in Borrell (1990), Neufeld (1988) and Briere (1966) who pointed out that it is common that when learning an L2, not all sounds that are correctly perceived will be correctly pronounced. Furthermore, an experiment carried out by Sheldon & Strange (1982) with Japanese speakers of English showed that the production of the English contrast between /r/ and /l/ was more accurate then the perception of it. The view perception precedes production is supported by evidence from many more studies, which seems to imply that generally, perception does precede production, at least for vowels. Already in 1939, Trubetzkoy proposed that bilinguals tend to perceive L2 sounds with their own L1 phonology, which may lead to wrong productions or accentedness of L2 sounds. Borden, Gerber & Milsark (1983) examined the relationship between perception and production of English /l/ and /r/ in Korean learners of English. They found that perceptual judgments of /r/ and /l/ improved 14

15 before production and that self-perception develops earlier and may be a prerequisite for accurate production. Flege (1993) examined vowel duration as a cue to voicing in English words produced and perceived by Chinese speakers of English. The study revealed correlations between differences in perceived vowel duration and degree of foreign accent and Flege (1993) concluded that [...] nonnatives will resemble native speakers more closely in perceiving than in producing vowel duration differences [...]. Numerous studies have proven that listeners have difficulty perceiving and making phonetic distinctions that do not exist in their native language. A common view in the 1970s was that interference from the L1 is the primary phonological cause of non-native productions: 1) an L2 sound that is identified with a sound in the L1 will be replaced by the L1 sound; 2) contrasts between sounds in the L2 that do not exist in the L1 will not be honored; 3) contrasts in the L1 that are not found in the L2 may nevertheless be produced in the L2 (e.g. Weinreich, 1953; Lehiste, 1988). Two more recent working models that focus on phonological contrasts in L1/L2 and support the perceptive view on L2-learning are Flege s Speech Learning Model (SLM) and Best s Perceptual Assimilation Model (PAM). SLM (Flege, 1995) claims that [...] without accurate perceptual targets to guide the sensorimotor learning of L2 sounds, production of the L2 sounds will be inaccurate [...]. The model makes the assumptions that the phonetic systems used in the production and perception of vowels and consonants remain adaptive over the life span and that new phonetic categories are added or old ones are modified in the phonetic systems when L2 sounds are encountered. It hypothesizes that many (but not all!) L2 production errors have a perceptual basis. Learners perceptually relate positional allophones in the L2 to the closest positionally defined allophone in the L1 in acoustic-phonetic terms, such as the F1/F2 formant space for vowels. L2 learners can establish a new phonetic category for an L2 sound that differs from the closest L1 sound. The greater the perceived distance of an L2 sound from the closest L1 sound, the more likely that a new phonetic category will be established. According to PAM (Best, 1995), non-native sounds are perceptually assimilated to native phonetic categories according to their articulatory-phonetic (gestural) similarity to native gestural constellations (Browman & Goldstein, 1989), where gestures are defined by the articulators, place 15

16 of articulation and manner of articulation. The model states that non-native speech perception is strongly affected by the linguistic experience of the listener with phonological contrasts and that listeners perceptually assimilate non-native phones to native phones whenever possible. In PAM, a given non-native phone may be perceptually assimilated to the native system of phonemes in one of three ways: as a Categorized exemplar of some native phoneme: if the contrasting phones are both assimilated as good exemplars of a single native phoneme then perceptual differentiation is difficult; if the contrasting phones differ in their goodness of fit, thus being an exemplar of a single native phoneme, then perceptual differentiation is somewhat easier as an Uncategorized sound that falls somewhere in between native phonemes, the non-native phone is roughly similar to two or more phonemes, perceptual differentiation is easy as a Nonassimilable nonspeech sound that bears no detectable similarity to any native phoneme, the non-native phone will be perceptually differentiated on the basis of its auditory or phonetic characteristics. The main difference between the two models is that SLM places the emphasis on an acousticphonetic specification of phonetic similarity whereas PAM assumes an articulatory specification of phonetic similarity. 2.3 What kind of pronunciation errors do L2-learners make? A distinction can be drawn between pronunciation errors that are made on a segmental level and errors that are made on a suprasegmental level. On a segmental level, errors may concern vowel and consonant quality and may be explained by differences between language systems. An example of a segmental pronunciation error is the pronunciation of the Dutch /i/ in vies and /I/ in vis : Japanese and Italian L2 learners of Dutch do not know the difference between /i/ and /I/ because this distinction does not exist in their L1. The same applies to the mispronunciation of /A/ as /a:/ length is a distinctive feature in Dutch whereas in e.g. Italian this distinctive feature does not exist. 16

17 Another example of a segmental pronunciation error is the mispronunciation of /x/, which is a very common error in Dutch which again might be due to the fact that /x/ is not encountered in many other languages. The pronunciation errors may be mispronunciations of L2 sounds, also called substitutions, because an L2 sound is substituted with another sound, but also deletions or insertions of sounds in L2 occur. The two latter pronunciation errors may be due to differences in syllable structure between L1 and L2. Japanese and Arabic do not allow branching onsets or codas, so an L2 word may be modified so that it fits the L1 syllable structure, which results in vowel epenthesis (see fig. 2.1 and fig. 2.2, both examples were taken from O Grady et al., 1996). In Turkish, a word cannot begin with two consonants and Spanish does not allow an /s/ word-initially followed by a sequence of consonants (O Grady et al., 1996). Figure 2.1: English target word with its syllable structure. Figure 2.2: Non-native speaker s version of English target word. In addition to having deviant intonational contours and deviant lexical stress patterns, L2 learners tend to have lower speech rates and a higher number of disfluencies such as stops, repetitions, and pauses, which result in lower fluency (suprasegmental errors). An example of a suprasegmental pronunciation error is incorrect stress placement. L2 learners have to acquire the stress patterns of the language they are trying to learn, which is difficult because the stress patterns of L1 interfere. Consider Polish in which word-level stress is assigned to the penultimate (next-to-last) syllable, regardless of syllable weight. Whereas in English, stress can also fall on the antipenultimate (third from the end of word) syllable depending on the heaviness of the syllable. The tendency of Polish speakers to place stress on the penultimate syllable regardless of syllable weight is a common pronunciation error in English (see table 2.1). 17

18 English target as tonish main tain cabinet Non-Native form as tonish maintain ca binet Table 2.1: Example of a non-native stress pattern in which the next-to-last syllable is always stressed (this example was taken from O Grady et al., 1996). Researchers have examined the spectral differences between native and non-native speech and found that one of the largest differences between these two types of speech are the patterns of the second and higher formants (Arslan, 1996; Flege, 1987). This finding can be explained by Fant (1960), who showed that small changes in the configuration of the tongue position can lead to large shifts in the frequency location of F2 and F3, while the frequency location of F1 only changes if the overall shape of the vocal tract changes. To improve intelligibility of L2 learners and methods of pronunciation teaching, researchers have tried to establish pronunciation error gravity hierarchies, so that priority can be given to those errors that have the most damaging effect on the intelligibility of speech (e.g. Van Heuven et al., 1981; Anderson-Hsieh et al, 1992; Derwing & Munro, 1997). Although the answer to this issue is still not clear, it appears that both segmental aspects and suprasegmental aspects play important roles. Both aspects can be measured separately, but they do influence each other as the case of lexical stress illustrates. A stressed syllable is usually characterized by a clearer pronunciation (which may cause spectral differences, segmental), a higher amplitude (segmental), a higher pitch (suprasegmental) and a longer duration (suprasegmental). 2.4 Possible goals in pronunciation teaching Studies have shown that foreign accents may have negative consequences for non-native speakers. Listeners detect divergences between the phonetic norms of their L1 and those of the non-native speaker, and may for instance misjudge the non-native speaker s affective state (e.g. Holden & Hogan, 1993). Although several studies have shown that a general bias against foreign accentedness in speech exists and that native listeners tend to downgrade non-native speakers because of their foreign accent, these observations do not directly mean that language teachers should aim at teaching accent-free speech. Abercrombie (1956) argued that most language learners need no more than a comfortably intelligible pronunciation. Witt (1999) agrees with Abercrombie and 18

19 defined comfortable intelligibility as [...] a level of pronunciation quality, where words are correctly pronounced to their phonetic transcription, but there are still subtle differences in how these phonemes sound in comparison with native speakers [...] the speech of comfortably-intelligible non-native speakers might differ from native speakers with regard to intonation and rhythm, but on overall their speech is understandable without requiring too much effort from a listener [...]. Comfortable intelligibility seems to be a widely accepted goal in pronunciation teaching: Munro & Derwing (1995) describe intelligibility as [...] the extent to which a speaker s message is actually understood by a listener, but there is no universally accepted way of assessing it [...]. The goal of Munro & Derwing s study was to examine the interrelationships among accentedness, perceived comprehensibility and intelligibility in the speech of second language learners. Foreign accent and intelligibility are related, but it is still not clear how foreign accent affects intelligibility. The most important finding of their research is that [...] although strength of foreign accent is indeed correlated with comprehensibility and intelligibility, a strong foreign accent does not necessarily cause second language speech to be low in comprehensibility or intelligibility [...]. Thus their study suggests that existing programs and second language instructors aiming at foreign accent reduction or accent-free speech do not necessarily improve the intelligibility of a second language learner s speech. In the present study, we aim at teaching intelligible speech rather than accent-free speech (see also section 3.3.1). We agree with Abercrombie s view that most language learners do not need more than comfortable intelligibility. 2.5 Overview of automatic pronunciation error detection techniques in the literature Overview of ASR-based techniques In this section, the focus is on different techniques for automatic detection of pronunciation errors that have already been examined and described in the literature. These techniques should be built in such a way that they match as closely as possible the judgments of human listeners: in order to be valid, automatic pronunciation error detection techniques or machine scores should correlate with scores or judgments given by humans. Measures that seem to correlate well with 19

20 human judgments are temporal measures (which are acoustic measures); they are strongly correlated with human ratings of pronunciation and fluency (Cucchiarini et al., 2000; Neumeyer et al., 2000). Cucchiarini et al. (2000) showed that expert fluency ratings can be predicted on the basis of automatically calculated temporal measures such as rate of speech or articulation rate (timing scores). Another finding was that temporal measures for native and non-native speakers differed significantly and indicated that native speakers are more fluent than non-native speakers and that non-natives normally speak more slowly than natives. Fluency is often used in tests to evaluate non-native speakers pronunciation. Consequently, other temporal measures that are related to rate of speech or articulation rate, such as duration scores (relative phone durations) and timing scores (rhythm), also correlate well with human listener s judgments (see also Neumeyer et al., 2000). Thus these above mentioned temporal (acoustic) measures all function as good predictors of pronunciation quality because they correlate strongly with human judgments. Therefore, in principle, machine scores based on temporal measures can suffice for good native and non-native pronunciation assessment, but not for pronunciation training in a CALL application. With temporal measures alone, feedback can only be given on temporal aspects of non-native pronunciation. Unfortunately, telling the student to speak faster or to make fewer pauses does not help the student a lot to improve his/her pronunciation. Therefore, temporal measures should be supplemented with other measures that are able to evaluate segmental or other suprasegmental aspects of non-native speech. These other measures and techniques have been developed by researchers to detect segmental pronunciation errors and to evaluate non-native pronunciation quality by using parameters from the ASR system. Nowadays, many CALL applications use measures which are generalized under the term confidence measures : these ASR confidence measures represent in some way the confidence of the ASR system in deciding that the given signal X belongs to pattern Y; or in other words, how confident the ASR system is that the given signal X belongs to pattern Y. ASR confidence measures have the advantage that they can be obtained fairly easily, and that they can be calculated in similar ways for all speech sounds. These measures based on spectral match can be combined with temporal measures to compute a combined score to increase human-machine correlation. Because a good deal of statistics is involved in these scores and methods, I will first shortly explain some statistical terms. 20

21 The recognition problem in ASR can be reduced to the following statistical problem: given a set of measurements, vector X, what is the probability of it belonging to a word sequence W? In other words, compute P (W X). The posterior probability P (W X) cannot be computed directly: it can only be estimated after the data has been seen (hence the term posterior ). Therefore Bayes Rule is used to estimate the posterior probability: P (W X) = (P (X W ) P (W ))/P (X) In the above formula, P (X W ) represents the probability density function: given a word sequence W, what is the probability of vector X belonging to that word sequence? This is often called the data likelihood. P(W) is the probability that the word sequence W was uttered: this represents the language model that is independent of the observation vectors and is based on prior knowledge. P(X) is a fixed probability: the average probability that the vector X was observed. These statistical measures, likelihoods and posterior probabilities that are derived from the formula just presented, are used to supplement duration scores and timing scores in scoring pronunciation quality of non-native speech. Log-likelihood is assumed to be a good measure of the similarity between native and non-native speech, therefore Neumeyer et al. (1996) compared loglikelihood scores to segment duration scores (relative phone duration normalized by rate of speech) and timing scores (speaking rate, rhythm) by computing correlations between machine and human scores at sentence and speaker level. The correlations in Neumeyer et al. (1996) showed that HMMbased log-likelihoods are poor predictors of pronunciation ratings. The timing scores resulted in acceptable speaker level correlations, but normalized segment duration scores produced the best results. So the duration-based scores outperformed the HMM-based log-likelihood scores. This study was extended in Franco et al. (1997) by examining other HMM-based scores, namely average phone segment posterior probabilities, and comparing them to log-likelihood and duration scores. This time, the HMM-based posterior probabilities produced higher human-machine correlations than log-likelihood and duration scores. The two previous approaches (Neumeyer et al., 1996; Franco et al., 1997) focused on rating an entire sentence rather than targeting specific phone segments. Kim et al. (1997) extended their work by assessing the pronunciation quality of individual phone segments within a sentence. Probabilistic measures given in Franco et al. (1997) were compared to each other and again the 21

22 score based on posterior probability was the best at phone and sentence level. Duration scores that previously showed high human-machine correlations (Neumeyer et al., 1996) now turned out to be poor measures at phone level. However, the results of duration scores improved and showed the strongest improvement when the amount of training data increased. This is not surprising since it is generally known that adding more training data can improve performance. Humanmachine correlations on phone level were always lower than correlations on sentence level, so rating a single phone by machines is still problematic. Our techniques presented in this study aim at evaluating a single phone; by adopting the approach presented in chapter 3 we hope to achieve higher human-machine agreement at segment level. Another ASR-based method that focuses on rating a phone rather than a word or sentence is Witt & Young s Goodness of Pronunciation (GOP) method (Witt & Young, 2000). Their GOP score is primarily based on the posterior probability of an uttered phoneme. A threshold is used to decide whether a phoneme was correct or not, based on the GOP score in relation to a predetermined threshold. Thus posterior probabilities and temporal measures individually produced good results on sentence level. Therefore, combining these scores might result in even higher human-machine correlations. A combination of such scores was examined in several studies (Franco et al., 1997; Franco et al., 2000) and indeed showed that a combination of scores in almost every case produced higher human-machine correlations than a single posterior probability score. Linear and nonlinear regression methods, which were used to predict the human grade from a set of machine scores, were investigated and it appeared that a nonlinear combination of machine scores produced better results than a linear combination of scores. In the best case, an increase of 11% in correlation was obtained by using nonlinear regression with a neural network combining posterior, duration and timing scores (Franco et al., 2000). Thus these studies have shown that some optimal confidence scores can be combined to achieve higher human-machine correlations at sentence or speaker level Adding extra knowledge to acoustic models and ASR-based techniques The measures described above were all obtained from HMM models trained on native speech only. Several methods have been introduced where confidence scores were obtained from adapted acoustic models trained on both native and non-native speech. Furthermore, different methods have been 22

23 proposed to integrate knowledge about the expected set of mispronunciations in the phone models or pronunciation networks. HMM models trained with native speech data only can be expanded to form a network with alternative pronunciations, where models trained on native and non-native speech are used. In the MisPronunciation (MP) network by Ronen et al. (1997) each phone can be optionally pronounced as a native or as a non-native sound. This network is then searched using the Viterbi algorithm. To evaluate the overall pronunciation quality, a mispronunciation score can be computed that is the relative ratio of the number of non-native phones to the total number of phones in the sentence. The human-machine correlations obtained with the new MP models were almost equal to those of the previous native models. Similarly to Ronen et al. (1997), Franco et al. (1999) used two different acoustic models for each phone, one trained on acceptable, native speech and another trained on incorrect, strongly non-native speech to detect mispronunciations at phone level. For each phone, a log-likelihood ratio score was computed using the correct and incorrect pronunciation models and compared to a posterior probability score (we have seen that posterior scores correlate well with human scores, Franco et al., 1997; Kim et al., 1997) computed from models based only on native speech. Results have shown that the method using both native and non-native models, thus the log-likelihood ratio score, had higher human-machine correlations than the method using only native models, the posterior score. Deroo et al. (2000) also used correct (native-like) and incorrect (strong non-native-like) speech to model the acoustic models, but this time by using a hybrid system combining HMM models and ANN (Artificial Neural Networks) to detect mispronunciations at phone level. Unfortunately, their phoneme models trained with native or non-native speech were very similar to each other, so the system was not able to discriminate between wrong and right pronunciations. A second approach produced better results. This time, knowledge about expected mispronunciations was used: phoneme graphs were built taking all wrong pronunciations of that phoneme into account. A disadvantage of this approach is that this method requires knowing in advance all the mistakes that can be uttered by non-native speakers. 23

24 2.6 Automatic pronunciation error detection techniques employed in real-life applications Employing ASR-based techniques in real-life CALL applications Some of the methods and scores that have been discussed in the above sections are applied in real-life CALL systems, such as the SRI EduSpeak System (Franco et al., 2000), the ISLE system (Menzel et al., 2000) and the PLASER system (Mak et al., 2003). The EduSpeak toolkit uses acoustic models trained with Bayesian adaptation techniques that optimally combine native and non-native training data so both type of speakers can be handled with the same models with good recognition performance. In this way, improvement in recognition for the non-native speakers was achieved without degrading the recognition performance on the native speakers. The score used in this system is a combination of previously discussed machine scores: the logarithm of posterior probability, phone duration and speech rate. In the ISLE system, which focuses on Italian and German learners of English, the development of the pronunciation training is divided into two components: automatic localization of pronunciation errors and correction of pronunciation errors (Menzel et al., 2000). Localization of pronunciation errors is done by identifying the areas of an utterance that are likely to contain pronunciation errors. Only the most severe errors are selected by the error localization component that assigns confidence scores to each speech segment. A speech segment with a low confidence score represents a mispronounced segment. These scores are based on probabilistic measures such as the acoustic likelihood of the recognized path. After localizing areas that are likely to contain errors, specific pronunciation errors are detected and diagnosed for correction. Pronunciation errors that a student might make are predicted by rules that describe how a pronunciation is altered. This results in a set of alternative pronunciations for each entry in the dictionary; one of the alternative pronunciations of course include the correct one. Again, all the mistakes that could be made by non-native speakers should be known in advance. Unfortunately, the system performed poor on finding and explaining pronunciation errors (Menzel et al., 2000). The PLASER system (Mak et al., 2003), designed to teach English pronunciation to speakers of Cantonese Chinese, computes a confidence-based score for each phoneme of a given word. An English corpus and a Cantonese corpus were both used to develop Cantonese-accented English 24

25 phoneme HMMs. To asses pronunciation accuracy of a phoneme the Goodness of Pronunciation measure (GOP) is used. Evaluation of the system showed that the pronunciation accuracy of about 75% of the students improved after using the system for a period of 2-3 months Using acoustic-phonetic information in real-life CALL applications The acoustic-phonetic approach, which is the approach adopted in this study, is not frequently used as a technique to detect pronunciation errors. Most of the existing methods use scores such as those described above to evaluate non-native speech. Some projects or systems that adopt approaches resembling the acoustic-approach use raw acoustic data to provide feedback by displaying waveforms, spectrograms, energy or intonation contours. However, a substantial difference with our acoustic-phonetic approach is that, in those methods, no actual assessment is done, based on acoustic-phonetic data. The VICK system (Nouza, 1998) displays user-friendly visual patterns formed from the students speech (single words and short phrases) and compares them to reference utterances. Different types of parameters of the same signal are available for visualization, e.g. the time waveform, the spectrogram and the energy of F0 contours, vowel plots, diagrams or phonetic labels. Feedback on the students pronunciation is given by showing and pointing out deviations in a difference panel that indicates the parts of speech with major differences between the trainees attempt and the references. The VICK system uses two classifiers for the automatic evaluation of speech: primarily a DTW (Dynamic Time Warping) classifier is used (Nouza, 1998). The distance between the utterance and the reference is evaluated for the whole set of features or for a specific feature subset such as log energy or F0. The evaluation is based on means and variances computed from the scores achieved with the reference speakers. In the SPELL project (Hiller et al., 1994), different modules teaching consonants, vowel quality, rhythm and intonation, are characterized by an acoustic similarity metric used to evaluate the pronunciation of a student. For instance, for the rhythm module, duration and vowel quality are used as acoustic parameters. The vowel teaching module uses a set of acoustically-based vowel targets which are derived from a set of vowel tokens produced by a group of native speakers. First, a student s vowel token is analyzed to produce estimates of the formants and pitch. After a normalization procedure, these acoustic parameters are then used to provide feedback in a graphical 25

26 Figure 2.3: An example of the VICK s screen (from Nouza, 1998) display for the student. In the display, an elliptic vowel target for the vowel and the position of the user s attempt is shown. The vowel similarity metric decides whether the user s vowel token falls within this target vowel space. The consonant module uses a rather different analysis. A list of pronunciation errors in consonant production by non-native speakers of English was first made and ranked according to their expected effect on intelligibility. Substitutions were one of the most frequent consonantal errors. These errors are detected in SPELL by using a simplified speech recognition technique. Each utterance has a specified phonetic sequence containing the desired sequence of segments and the likely substitutions (errors) which the student might make. The errors produced by the student are then detected by the choices the speech recognizer made in recognizing the utterance. WinPitch LTL (Germain-Rutherford & Martin, 2000) is another system that provides feedback by visualizing acoustic data. Learners can visualize the pitch curve, the intensity curve and the waveform of their own recorded speech. A useful feature of this system is speech synthesis: for instance, students can hear the correct prosodic contours produced with the students own voice and comparisons of prosodic patterns can be made between the students recorded and synthesized segments. The system offers other user-friendly functions as well, such as many edit-functions to facilitate the learning process. But a major disadvantage of this system is that the system does not include ASR: thus no automatic check of the contents of the student s utterance is available. Therefore, a teacher is required to do this (e.g. produce the phonetic transcription of the utterance) 26

27 Figure 2.4: Examples of the SPELL s screen (from Hiller et al., 1994) and to explain the students what the meaning is of the various acoustic analyses. A general questionable issue of CAPT systems that visualize acoustic data to give feedback to language learners is that some training in reading and understanding the displays is required beforehand and that in some cases, a teacher is required. Furthermore, matching visual displays is not always recommended, for instance it is known that matching acoustic waveforms is not very helpful. Consequently, visualizing acoustic data can be very tricky and therefore this kind of data should be used with care. Although these applications use acoustic information, actual assessment of pronunciation based on acoustic information is not done. The acoustic-phonetic approach adopted in this study, described in the next chapter (chapter 3), will use specific acousticphonetic information to evaluate non-native pronunciation. 27

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

Mandarin Lexical Tone Recognition: The Gating Paradigm

Mandarin Lexical Tone Recognition: The Gating Paradigm Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition

More information

Fix Your Vowels: Computer-assisted training by Dutch learners of Spanish

Fix Your Vowels: Computer-assisted training by Dutch learners of Spanish Carmen Lie-Lahuerta Fix Your Vowels: Computer-assisted training by Dutch learners of Spanish I t is common knowledge that foreign learners struggle when it comes to producing the sounds of the target language

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Universal contrastive analysis as a learning principle in CAPT

Universal contrastive analysis as a learning principle in CAPT Universal contrastive analysis as a learning principle in CAPT Jacques Koreman, Preben Wik, Olaf Husby, Egil Albertsen Department of Language and Communication Studies, NTNU, Trondheim, Norway jacques.koreman@ntnu.no,

More information

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH Mietta Lennes Most of the phonetic knowledge that is currently available on spoken Finnish is based on clearly pronounced speech: either readaloud

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA LANGUAGE AND SPEECH, 2009, 52 (4), 391 413 391 Variability in Word Duration as a Function of Probability, Speech Style, and Prosody Rachel E. Baker, Ann R. Bradlow Northwestern University, Evanston, IL,

More information

Automatic Pronunciation Checker

Automatic Pronunciation Checker Institut für Technische Informatik und Kommunikationsnetze Eidgenössische Technische Hochschule Zürich Swiss Federal Institute of Technology Zurich Ecole polytechnique fédérale de Zurich Politecnico federale

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Rhythm-typology revisited.

Rhythm-typology revisited. DFG Project BA 737/1: "Cross-language and individual differences in the production and perception of syllabic prominence. Rhythm-typology revisited." Rhythm-typology revisited. B. Andreeva & W. Barry Jacques

More information

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers October 31, 2003 Amit Juneja Department of Electrical and Computer Engineering University of Maryland, College Park,

More information

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,

More information

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all Human Communication Science Chandler House, 2 Wakefield Street London WC1N 1PF http://www.hcs.ucl.ac.uk/ ACOUSTICS OF SPEECH INTELLIGIBILITY IN DYSARTHRIA EUROPEAN MASTER S S IN CLINICAL LINGUISTICS UNIVERSITY

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access Joyce McDonough 1, Heike Lenhert-LeHouiller 1, Neil Bardhan 2 1 Linguistics

More information

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397, Adoption studies, 274 275 Alliteration skill, 113, 115, 117 118, 122 123, 128, 136, 138 Alphabetic writing system, 5, 40, 127, 136, 410, 415 Alphabets (types of ) artificial transparent alphabet, 5 German

More information

How to Judge the Quality of an Objective Classroom Test

How to Judge the Quality of an Objective Classroom Test How to Judge the Quality of an Objective Classroom Test Technical Bulletin #6 Evaluation and Examination Service The University of Iowa (319) 335-0356 HOW TO JUDGE THE QUALITY OF AN OBJECTIVE CLASSROOM

More information

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Neural Network GUI Tested on Text-To-Phoneme Mapping A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis

More information

Phonological and Phonetic Representations: The Case of Neutralization

Phonological and Phonetic Representations: The Case of Neutralization Phonological and Phonetic Representations: The Case of Neutralization Allard Jongman University of Kansas 1. Introduction The present paper focuses on the phenomenon of phonological neutralization to consider

More information

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Hua Zhang, Yun Tang, Wenju Liu and Bo Xu National Laboratory of Pattern Recognition Institute of Automation, Chinese

More information

The Acquisition of English Intonation by Native Greek Speakers

The Acquisition of English Intonation by Native Greek Speakers The Acquisition of English Intonation by Native Greek Speakers Evia Kainada and Angelos Lengeris Technological Educational Institute of Patras, Aristotle University of Thessaloniki ekainada@teipat.gr,

More information

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS ELIZABETH ANNE SOMERS Spring 2011 A thesis submitted in partial

More information

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and

More information

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence Bistra Andreeva 1, William Barry 1, Jacques Koreman 2 1 Saarland University Germany 2 Norwegian University of Science and

More information

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University 1 Perceived speech rate: the effects of articulation rate and speaking style in spontaneous speech Jacques Koreman Saarland University Institute of Phonetics P.O. Box 151150 D-66041 Saarbrücken Germany

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Speech Communication Session 2aSC: Linking Perception and Production

More information

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,

More information

Segregation of Unvoiced Speech from Nonspeech Interference

Segregation of Unvoiced Speech from Nonspeech Interference Technical Report OSU-CISRC-8/7-TR63 Department of Computer Science and Engineering The Ohio State University Columbus, OH 4321-1277 FTP site: ftp.cse.ohio-state.edu Login: anonymous Directory: pub/tech-report/27

More information

South Carolina English Language Arts

South Carolina English Language Arts South Carolina English Language Arts A S O F J U N E 2 0, 2 0 1 0, T H I S S TAT E H A D A D O P T E D T H E CO M M O N CO R E S TAT E S TA N DA R D S. DOCUMENTS REVIEWED South Carolina Academic Content

More information

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

Pobrane z czasopisma New Horizons in English Studies  Data: 18/11/ :52:20. New Horizons in English Studies 1/2016 LANGUAGE Maria Curie-Skłodowska University () in Lublin k.laidler.umcs@gmail.com Online Adaptation of Word-initial Ukrainian CC Consonant Clusters by Native Speakers of English Abstract. The phenomenon

More information

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION by Adam B. Buchwald A dissertation submitted to The Johns Hopkins University in conformity with the requirements

More information

Stages of Literacy Ros Lugg

Stages of Literacy Ros Lugg Beginning readers in the USA Stages of Literacy Ros Lugg Looked at predictors of reading success or failure Pre-readers readers aged 3-53 5 yrs Looked at variety of abilities IQ Speech and language abilities

More information

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders

More information

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty Julie Medero and Mari Ostendorf Electrical Engineering Department University of Washington Seattle, WA 98195 USA {jmedero,ostendor}@uw.edu

More information

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab Revisiting the role of prosody in early language acquisition Megha Sundara UCLA Phonetics Lab Outline Part I: Intonation has a role in language discrimination Part II: Do English-learning infants have

More information

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Voiced-voiceless distinction in alaryngeal speech - acoustic and articula Nord, L. and Hammarberg, B. and Lundström, E. journal:

More information

Phonological Processing for Urdu Text to Speech System

Phonological Processing for Urdu Text to Speech System Phonological Processing for Urdu Text to Speech System Sarmad Hussain Center for Research in Urdu Language Processing, National University of Computer and Emerging Sciences, B Block, Faisal Town, Lahore,

More information

English Language and Applied Linguistics. Module Descriptions 2017/18

English Language and Applied Linguistics. Module Descriptions 2017/18 English Language and Applied Linguistics Module Descriptions 2017/18 Level I (i.e. 2 nd Yr.) Modules Please be aware that all modules are subject to availability. If you have any questions about the modules,

More information

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer Available online at www.sciencedirect.com Procedia - Social and Behavioral Sciences 46 ( 2012 ) 3011 3016 WCES 2012 Demonstration of problems of lexical stress on the pronunciation Turkish English teachers

More information

CEFR Overall Illustrative English Proficiency Scales

CEFR Overall Illustrative English Proficiency Scales CEFR Overall Illustrative English Proficiency s CEFR CEFR OVERALL ORAL PRODUCTION Has a good command of idiomatic expressions and colloquialisms with awareness of connotative levels of meaning. Can convey

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

Florida Reading Endorsement Alignment Matrix Competency 1

Florida Reading Endorsement Alignment Matrix Competency 1 Florida Reading Endorsement Alignment Matrix Competency 1 Reading Endorsement Guiding Principle: Teachers will understand and teach reading as an ongoing strategic process resulting in students comprehending

More information

Physics 270: Experimental Physics

Physics 270: Experimental Physics 2017 edition Lab Manual Physics 270 3 Physics 270: Experimental Physics Lecture: Lab: Instructor: Office: Email: Tuesdays, 2 3:50 PM Thursdays, 2 4:50 PM Dr. Uttam Manna 313C Moulton Hall umanna@ilstu.edu

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Different Task Type and the Perception of the English Interdental Fricatives

Different Task Type and the Perception of the English Interdental Fricatives Different Task Type and the Perception of the English Interdental Fricatives Mara Silvia Reis, Denise Cristina Kluge, Melissa Bettoni-Techio Federal University of Santa Catarina marasreis@hotmail.com,

More information

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion

More information

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project Phonetic- and Speaker-Discriminant Features for Speaker Recognition by Lara Stoll Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California

More information

Voice conversion through vector quantization

Voice conversion through vector quantization J. Acoust. Soc. Jpn.(E)11, 2 (1990) Voice conversion through vector quantization Masanobu Abe, Satoshi Nakamura, Kiyohiro Shikano, and Hisao Kuwabara A TR Interpreting Telephony Research Laboratories,

More information

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan James White & Marc Garellek UCLA 1 Introduction Goals: To determine the acoustic correlates of primary and secondary

More information

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS

THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS THE PERCEPTION AND PRODUCTION OF STRESS AND INTONATION BY CHILDREN WITH COCHLEAR IMPLANTS ROSEMARY O HALPIN University College London Department of Phonetics & Linguistics A dissertation submitted to the

More information

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education GCSE Mathematics B (Linear) Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education Mark Scheme for November 2014 Oxford Cambridge and RSA Examinations OCR (Oxford Cambridge

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

L1 Influence on L2 Intonation in Russian Speakers of English

L1 Influence on L2 Intonation in Russian Speakers of English Portland State University PDXScholar Dissertations and Theses Dissertations and Theses Spring 7-23-2013 L1 Influence on L2 Intonation in Russian Speakers of English Christiane Fleur Crosby Portland State

More information

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 - C.E.F.R. Oral Assessment Criteria Think A F R I C A - 1 - 1. The extracts in the left hand column are taken from the official descriptors of the CEFR levels. How would you grade them on a scale of low,

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Vowel mispronunciation detection using DNN acoustic models with cross-lingual training

Vowel mispronunciation detection using DNN acoustic models with cross-lingual training INTERSPEECH 2015 Vowel mispronunciation detection using DNN acoustic models with cross-lingual training Shrikant Joshi, Nachiket Deo, Preeti Rao Department of Electrical Engineering, Indian Institute of

More information

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM BY NIRAYO HAILU GEBREEGZIABHER A THESIS SUBMITED TO THE SCHOOL OF GRADUATE STUDIES OF ADDIS ABABA UNIVERSITY

More information

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH Don McAllaster, Larry Gillick, Francesco Scattone, Mike Newman Dragon Systems, Inc. 320 Nevada Street Newton, MA 02160

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

Arabic Orthography vs. Arabic OCR

Arabic Orthography vs. Arabic OCR Arabic Orthography vs. Arabic OCR Rich Heritage Challenging A Much Needed Technology Mohamed Attia Having consistently been spoken since more than 2000 years and on, Arabic is doubtlessly the oldest among

More information

To appear in The TESOL encyclopedia of ELT (Wiley-Blackwell) 1 RECASTING. Kazuya Saito. Birkbeck, University of London

To appear in The TESOL encyclopedia of ELT (Wiley-Blackwell) 1 RECASTING. Kazuya Saito. Birkbeck, University of London To appear in The TESOL encyclopedia of ELT (Wiley-Blackwell) 1 RECASTING Kazuya Saito Birkbeck, University of London Abstract Among the many corrective feedback techniques at ESL/EFL teachers' disposal,

More information

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology ISCA Archive SUBJECTIVE EVALUATION FOR HMM-BASED SPEECH-TO-LIP MOVEMENT SYNTHESIS Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano Graduate School of Information Science, Nara Institute of Science & Technology

More information

Language Acquisition Chart

Language Acquisition Chart Language Acquisition Chart This chart was designed to help teachers better understand the process of second language acquisition. Please use this chart as a resource for learning more about the way people

More information

REVIEW OF CONNECTED SPEECH

REVIEW OF CONNECTED SPEECH Language Learning & Technology http://llt.msu.edu/vol8num1/review2/ January 2004, Volume 8, Number 1 pp. 24-28 REVIEW OF CONNECTED SPEECH Title Connected Speech (North American English), 2000 Platform

More information

Running head: DELAY AND PROSPECTIVE MEMORY 1

Running head: DELAY AND PROSPECTIVE MEMORY 1 Running head: DELAY AND PROSPECTIVE MEMORY 1 In Press at Memory & Cognition Effects of Delay of Prospective Memory Cues in an Ongoing Task on Prospective Memory Task Performance Dawn M. McBride, Jaclyn

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading ELA/ELD Correlation Matrix for ELD Materials Grade 1 Reading The English Language Arts (ELA) required for the one hour of English-Language Development (ELD) Materials are listed in Appendix 9-A, Matrix

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction CLASSIFICATION OF PROGRAM Critical Elements Analysis 1 Program Name: Macmillan/McGraw Hill Reading 2003 Date of Publication: 2003 Publisher: Macmillan/McGraw Hill Reviewer Code: 1. X The program meets

More information

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology Essentials of Ability Testing Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology Basic Topics Why do we administer ability tests? What do ability tests measure? How are

More information

Journal of Phonetics

Journal of Phonetics Journal of Phonetics 41 (2013) 297 306 Contents lists available at SciVerse ScienceDirect Journal of Phonetics journal homepage: www.elsevier.com/locate/phonetics The role of intonation in language and

More information

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda Content Language Objectives (CLOs) Outcomes Identify the evolution of the CLO Identify the components of the CLO Understand how the CLO helps provide all students the opportunity to access the rigor of

More information

WHEN THERE IS A mismatch between the acoustic

WHEN THERE IS A mismatch between the acoustic 808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

A Game-based Assessment of Children s Choices to Seek Feedback and to Revise

A Game-based Assessment of Children s Choices to Seek Feedback and to Revise A Game-based Assessment of Children s Choices to Seek Feedback and to Revise Maria Cutumisu, Kristen P. Blair, Daniel L. Schwartz, Doris B. Chin Stanford Graduate School of Education Please address all

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best

More information

Journal of Phonetics

Journal of Phonetics Journal of Phonetics 40 (2012) 595 607 Contents lists available at SciVerse ScienceDirect Journal of Phonetics journal homepage: www.elsevier.com/locate/phonetics How linguistic and probabilistic properties

More information

OVERVIEW OF CURRICULUM-BASED MEASUREMENT AS A GENERAL OUTCOME MEASURE

OVERVIEW OF CURRICULUM-BASED MEASUREMENT AS A GENERAL OUTCOME MEASURE OVERVIEW OF CURRICULUM-BASED MEASUREMENT AS A GENERAL OUTCOME MEASURE Mark R. Shinn, Ph.D. Michelle M. Shinn, Ph.D. Formative Evaluation to Inform Teaching Summative Assessment: Culmination measure. Mastery

More information

DIBELS Next BENCHMARK ASSESSMENTS

DIBELS Next BENCHMARK ASSESSMENTS DIBELS Next BENCHMARK ASSESSMENTS Click to edit Master title style Benchmark Screening Benchmark testing is the systematic process of screening all students on essential skills predictive of later reading

More information

A student diagnosing and evaluation system for laboratory-based academic exercises

A student diagnosing and evaluation system for laboratory-based academic exercises A student diagnosing and evaluation system for laboratory-based academic exercises Maria Samarakou, Emmanouil Fylladitakis and Pantelis Prentakis Technological Educational Institute (T.E.I.) of Athens

More information

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District Report Submitted June 20, 2012, to Willis D. Hawley, Ph.D., Special

More information

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form Orthographic Form 1 Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form The development and testing of word-retrieval treatments for aphasia has generally focused

More information

Phonological encoding in speech production

Phonological encoding in speech production Phonological encoding in speech production Niels O. Schiller Department of Cognitive Neuroscience, Maastricht University, The Netherlands Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands

More information

Effect of Word Complexity on L2 Vocabulary Learning

Effect of Word Complexity on L2 Vocabulary Learning Effect of Word Complexity on L2 Vocabulary Learning Kevin Dela Rosa Language Technologies Institute Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA kdelaros@cs.cmu.edu Maxine Eskenazi Language

More information

Interpreting ACER Test Results

Interpreting ACER Test Results Interpreting ACER Test Results This document briefly explains the different reports provided by the online ACER Progressive Achievement Tests (PAT). More detailed information can be found in the relevant

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

Age Effects on Syntactic Control in. Second Language Learning

Age Effects on Syntactic Control in. Second Language Learning Age Effects on Syntactic Control in Second Language Learning Miriam Tullgren Loyola University Chicago Abstract 1 This paper explores the effects of age on second language acquisition in adolescents, ages

More information

LING 329 : MORPHOLOGY

LING 329 : MORPHOLOGY LING 329 : MORPHOLOGY TTh 10:30 11:50 AM, Physics 121 Course Syllabus Spring 2013 Matt Pearson Office: Vollum 313 Email: pearsonm@reed.edu Phone: 7618 (off campus: 503-517-7618) Office hrs: Mon 1:30 2:30,

More information

Assessing speaking skills:. a workshop for teacher development. Ben Knight

Assessing speaking skills:. a workshop for teacher development. Ben Knight Assessing speaking skills:. a workshop for teacher development Ben Knight Speaking skills are often considered the most important part of an EFL course, and yet the difficulties in testing oral skills

More information

Generative models and adversarial training

Generative models and adversarial training Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?

More information

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION Han Shu, I. Lee Hetherington, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge,

More information

Psychology of Speech Production and Speech Perception

Psychology of Speech Production and Speech Perception Psychology of Speech Production and Speech Perception Hugo Quené Clinical Language, Speech and Hearing Sciences, Utrecht University h.quene@uu.nl revised version 2009.06.10 1 Practical information Academic

More information

Abstractions and the Brain

Abstractions and the Brain Abstractions and the Brain Brian D. Josephson Department of Physics, University of Cambridge Cavendish Lab. Madingley Road Cambridge, UK. CB3 OHE bdj10@cam.ac.uk http://www.tcm.phy.cam.ac.uk/~bdj10 ABSTRACT

More information