Hidden Markov-model based text-to-speech synthesis

Size: px
Start display at page:

Download "Hidden Markov-model based text-to-speech synthesis"

Transcription

1 Budapest University of Technology and Economics Department of Telecommunications and Media Informatics Hidden Markov-model based text-to-speech synthesis Ph.D. thesis booklet Doctoral School of Electrical Engineering Bálint Pál Tóth, M.Sc. Supervisors Géza Németh, Ph.D. Gábor Olaszy, D.Sc. Budapest, Hungary 2013

2 1. Introduction Speech production is a complex process: the brain precisely controls the articulatory system at high speed and the speaker gets audio feedback of his or her communication via the hearing organs. To be able to mimic speech production artificially, not only the articulatory system, but the mechanism of the brain should be understood. Since we are far from understanding the brain, models for speech production are built. The general goal of speech synthesis is to create a natural sounding, highly intelligible synthetic voice. In addition the following basic engineering aspects must be kept in mind: available resources and target platforms. The dissertation and the current thesis booklet focus on the general goal and the engineering aspects as well. 2. Background In general text-to-speech (TTS) synthesis systems consist of two main parts: text preprocessor and speech generation (see Figure 1.). The input text is converted into a feature matrix, which contains the phonemes of the input text and additional information (e.g. stresses, segmental features) generated by the text preprocessor. According to this feature matrix the synthesized voice waveform is created by the speech generation module. Text input Text preprocessor Feature matrix Speech generation Synthetic speech Figure 1. The general structure of TTS synthesis systems. The artificial production of speech has a long history. The first mechanical speech production system was built by Farkas Kempelen back in 1791 [1]. In the last three decades computer based approaches have been preferred [2,3]. Articulatory [4] and formant [5] synthesis tries to model the mechanism of speech organs. Diphone a triphone based speech synthesis concatenates phoneme level waveforms [6,7]. Unit selection speech synthesis systems concatenate waveforms from a precisely labelled speech corpus based on concatenation and target costs [8,9]. Recently hidden Markov-model (HMM) based speech synthesis has become a focused-on research area [10]. HMM-based TTS systems produce high-quality human-like voices. Compared to other speech synthesis systems with similar speech quality, the HMM speech synthesis footprint is fairly small, but computational cost and playback latency are often high. HMM-based TTS consists of two main parts: the training and the speech synthesis components. During the training process, HMM parameters are learned from a large precisely labelled speech corpus and generative models are built. The parameter approximation is based on the maximum likelihood (or similar) technique: 1

3 λˆ arg max{ p( O W, λ)} (1) where λ contains the model s parameters, O is the observation vector extracted from the speech corpus (training data) and W denotes the word sequence representing the speech corpus. As a result, a small HMM database is created, which includes the representative parameters of the speech corpus. At the synthesis stage, the best matching o output probabilities of λˆ model to w textual word sequence are maximized: λ oˆ arg max{ p( o w, λˆ)} (2) From these parameters the synthetic voice is generated by a vocoder algorithm. Figure 2. shows the general architecture of a hidden Markov-model based text-to-speech (HMM- TTS) synthesis system. Excitation and spectral parameters are extracted from the waveform and based on the phonetic transcription context dependent labels are calculated. This information is passed to the training algorithm (Equation 1). Context dependent labels typically contain phonemes, phoneme boundaries, accents, segmental information (phoneme, syllable, word, phrase, sentence level), and they may contain several additional features as well. The possible combinations of context dependent labels are high. A representative speech corpus containing all possible variations cannot be created. To overcome this problem decision tree based context clustering is used. In the training phase separate generative models are built for excitation parameters, spectral parameters and state durations. Continuous parameter streams (e.g. spectral parameters) are modelled by Gaussian distributions, and discrete/continuous parameters (e.g. voiced/unvoiced regions in excitation) are modelled with multispace probability distributions (MSD). In order to model the timing properly state transition probabilities are modelled by Gaussian distributions. In the synthesis phase Equation 2 is maximized: the HMM generative models create the most likely parameter stream of the input text. The waveform is created from this parameter stream with a vocoder algorithm. HMM-based speech synthesis has numerous advantages compared to other methods. It has comparable voice quality to that of the state-of-the-art unit selection methods, the runtime database is small (2-10 MB), the voice characteristics can be changed by speaker adaptation and interpolation and emotions can be expressed as well. o 3. Research objectives My general research topic is hidden Markov-model based text-to-speech synthesis. I focus on three different research areas of HMM-TTS: one of them is Hungarian language specific principally; two of them are language independent. The first research objective is creating a Hungarian hidden Markov-model based speech synthesis system and improving its quality. This part of the research includes designing speech corpora, introducing language specific features, distinctive features, 2

4 creating speaker dependent and speaker adaptive HMM-TTS systems and measuring the quality improvements of manual correction of automatic labelling. The second research objective is automatic speech recognizer transcription based unsupervised speaker adaptation of HMM-TTS systems with semi-spontaneous speech. The possibility of speaker adaptation with semi-spontaneous speech is investigated and an unsupervised speaker adaptation method is introduced. Results of subjective evaluation show that the proposed method is not significantly different from the supervised case even though the phoneme error rate (PER) is about 50%. The unsupervised adaptation method is extended to higher PERs as well. The third research topic is optimizing HMM-TTS for low-resource devices. The noise generation algorithm in the excitation modelling is modified, optimal spectral parameter settings are investigated, the number of nodes in decision trees is reduced and the segment size of parameter generation, the vocoder algorithm and waveform playback is optimized according to the performance of the device and the actual load of the CPU. I have chosen hidden Markov-model based text-to-speech synthesis for my research topic because of its novelty and countless possibilities. Furthermore it was a challenge to pioneer HMM-TTS research in Hungary. In the current thesis booklet I summarize the novel outcomes of my research grouped in the three research objectives. Speech corpus Durations, full-context labels Excitation parameter extraction Waveform HMM training Spectral parameter extraction HMM s database Training Synthesis Text input Text preprocessor Labels Parameter generation from HMMs Excitation Spectral filtering Synthetic speech Figure 2. The general architecture of HMM-TTS; based on [10]. 3

5 4. Methodology I will introduce in this chapter the speech corpora, the tools and the evaluation method used during my research Speech corpora A speech corpus contains the following: waveform (studio recordings are preferred), phonetic transcription and segmentation labels (phoneme boundaries). At the beginning of the research there was no available Hungarian speech corpus suitable for HMM-TTS. It is important for HMM-TTS that the speech corpus contains phonetically balanced sentences with regular phoneme distribution based on Hungarian language characteristics. The MTBA speech corpus includes about 6-7 minute long telephone conversations from 500 speakers, basically for speech recognition purposes [11]. I investigated the utterances of MTBA database and I found them suitable for HMM-TTS purposes, although at least one hour of studio quality recordings (min. 44 khz, 16 bit) is required from each speaker, and a smaller number of speakers is enough (5-10 speakers). Based on the MTBA sentences, therefore, and with the help of BME-TMIT Speech Laboratory colleagues, we recorded and segmented speech corpora from seven speakers (approx. 20 hours altogether). We took into account the experiences from the creation of the MTBA database [12]. I used these speech corpora in the first thesis group. In the second thesis group, in addition to the speech corpora of the first thesis group, I used semi-spontaneous parliamentary speeches. The semi-spontaneous speech corpora were collected from four speakers (approx. 4 hours), from which I selected 10 minutes per speaker for unsupervised adaptation with different methods. In the third thesis group I worked with an English speech corpus. I used the SLT speaker of the ARCTIC database from the Speech Technology Laboratory of the Carnegie Mellon University [13]. The speech corpora used in my research are summarized in Table 1. In my first thesis group I used only a part of the whole speech corpus of a given speaker for speaker adaptation (10-15 minutes). In the second thesis group for the average voice I used the databases from the first thesis group. For speaker adaptation I created several adaptation corpora which are described briefly in this booklet and in detail in my dissertation (these are not shown in the table) Synthesized sentences for listening tests My goal is to create a general solution. Consequently I synthesized generic declarative sentences (not domain specific ones) with the systems I created. These synthesized utterances were used in the listening tests. The language of the test sentences was defined by the language of the given HMM-TTS system. 4

6 Table 1. The speech corpora used for HMM-TTS research. Thesis group Symbol Length Sex Language Purpose M1 190 min M2 137 min Speaker dependent M3 170 min male training, I. M4 214 min average voice training, M5 198 min (supervised) speaker F1 128 min Hungarian adaptation female F2 193 min II. M min Supervised and M7 9.6 min male unsupervised speaker M min adaptation M9 9.7 min III. CMU- ARCTIC- SLT 47 min female English HMM speech synthesis on low-resource devices (speaker dependent) 4.3. Experimental environment I used open source tools and previous solutions of BME-TMIT. The main toolkits and applications used were as follows (the complete list can be found in my dissertation): HTS (HMM-based Text-To-Speech System): Training of speaker dependent HMMs, training of average voice HMMs, speaker adaptation. [14] SPTK (Speech Processing Toolkit): Parameter extraction and pulse-noise excitation based vocoder algorithm. [15] STRAIGHT: Parameter extraction and mixed excitation based vocoder algorithm. [16] hts_engine: Parameter generation from HMMs and waveform generation with pulsenoise excitation based vocoder algorithm. [14] ProfiVox: phonetic transcription and accents determination. [7] Hungarian large vocabulary automatic speech recognizer. [17] Forced alignment to determine phoneme boundaries automatically. [17] 4.4. Subjective evaluation I used mean opinion score (MOS) and comparison mean opinion score (CMOS) listening tests for subjective evaluation. Test subjects had to score each utterance from 1 (worst) to 5 (best, integers) in MOS tests. In the case of CMOS tests subjects had to decide from two utterances which one fulfils the given criterion better in a 5 point scale (e.g. quality, naturalness, intelligibility). In some cases test subjects had to decide what they thought about the meaning of quality. This way I received a general feedback about how the subjects consider the overall quality of the TTS system. Several features are included in the overall quality, e.g., naturalness, sympathy, emotions triggered by the voice characteristics. In other cases test subjects were asked to score a specific feature, e.g. 5

7 naturalness of the synthetic voice. The precise settings of the listening tests are described in my dissertation in detail. The MOS and CMOS figures show the average value and confidence interval of 95%. I have checked significance in each case. If two results had to be compared I used two sample t-test (MOS tests) or one sample t-test (in the case of CMOS tests). If more than two results had to be compared, I used ANalysi Of VAriance (ANOVA) significance test. If the ANOVA showed significant difference, then for post hoc comparison Tukey s test was used. I tested significance at 95% confidence level ( =0.05). In some cases MOS tests resulted in rather low values (~3), in other cases similar HMM- TTS systems scored better (~3.5-4). The reason for this difference can be explained by the involvement of natural speakers in the former case, with only synthetic voices involved in the latter. If a natural speaker is present, synthetic voices are considered worse, than in the case when there are only artificial voices. 5. New results 5.1. Thesis Group I. Hidden Markov-model based text-to-speech synthesis applied to Hungarian and quality enhancements. First I created a Hungarian HMM-TTS system and compared it to previous Hungarian TTS systems. The synthetic voice quality of the system introduced in Thesis I.1 is enhanced by distinctive features in Thesis I.2. I applied speaker adaptation and I showed that with speaker adaptation it is possible to create a synthetic voice with significantly better quality than in the case with speaker dependent training (Thesis I.3). At the end of Thesis Group I the effects of manual correction of labelling (segmentation, phonetic transcription) in speaker dependent and speaker adaptive cases are investigated. The evaluation of the results includes subjective listening tests in each thesis, and investigation of decision trees in Thesis I.2. Thesis I.1. [J2, J3, J4, B2a, B3, C6, C7] I designed and implemented hidden Markov-model based text-to-speech synthesis in Hungarian and I showed that with significantly smaller database size the quality of the HMM-TTS system is not significantly worse than the quality of the state-of-the-art, domain specific corpus based Hungarian text-to-speech system. At the beginning of the research there was no Hungarian HMM-TTS solution, consequently I could only use international publications as guidelines [18,19]. Because of the difference between languages and due to the structure of Hungarian it wasn t a trivial task to create a Hungarian HMM-TTS system. As a first step of the research suitable speech corpora were created (see Chapter 4.1). Based on features of the Hungarian language I defined the possible phonemes, context-dependent labels and questions for decision trees [20]. These are described in detail in the dissertation and in related publications. I created an HMM-TTS voice with M1 speech corpus (see Table 1) and synthesized sentences with a mixed-excitation vocoder. 6

8 Evaluation: I measured the quality of the resulting HMM-TTS system with listening tests. I compared the novel solution with two previous TTS systems developed at BME- TMIT: a triphone based [7] and a corpus based TTS system [9]. Figure 3. shows the results of the listing tests (left) and the runtime database sizes (right). According to the results the quality of HMM-TTS is not significantly worse than the state-of-the-art corpus based system, and it has a significantly smaller runtime database. Furthermore the quality of HMM-TTS is significantly better than the quality of triphone based synthesis, and the database size of these systems is not significantly different. Conclusion: the runtime database size of the corpus based TTS system is about 850 MBytes (12 hours of recordings), whereas the runtime database size of the HMM-TTS system is about 10 MBytes (with 2 hours of recordings in the training database). The corpus based system produces constant quality in fixed domains (e.g. weather forecast); the HMM- TTS system gives constant quality in general domains. (The sub-phoneme level parametric model of HMM-TTS is a general speech synthesis technique. Furthermore in Thesis I.2, I.3, I.4, II.1 and II.2 listening tests were carried out with synthesized sentences from different domains and there was no perceptual difference between the quality of the synthetic speech from different domains.) According to the listening test of the current thesis there was no significant perceptual difference between the HMM-TTS and the corpus-based TTS even in a fixed domain. These results prompted me to make deeper investigations with HMM-TTS HMM-TTS 3.9 Corpus based 2.56 Triphone based Figure 3. Subjective evaluation with listening test (left) and runtime database sizes (right) of HMM, corpus and triphone based TTS systems. Thesis I.2. [C2] I designed and implemented a method to apply distinctive features to a hidden Markov-model based text-to-speech system, and I showed that it is possible to increase the quality of synthetic speech by applying them. The same organs are used for speech production, and sound generation is independent from language [20]. The possibility of speech production is universal, although there are many language dependent differences. Distinctive features describe phonemes with binary and unary values language independently [21]. In Thesis I.1 I introduced a classification of Hungarian phonemes. With the help of distinctive features a more general description of phonemes is possible. I defined a set of distinctive features suitable for HMM-TTS for engineering purposes considering general linguistic principles and concepts. In the MB 100 MB 10 MB 1 MB ~10 MB HMM-TTS ~850 MB Corpus based ~2 MB Triphone based

9 elaborated hierarchy 18 distinctive features of three groups (articulator-free, articulatorbound, larynx) were used. I add distinctive features to the HMM-TTS system of Thesis I.1. I extended the questions used for decision tree building according to these distinctive features. I assigned two questions for binary and one question for unary features. My expectation was that distinctive features create more general clusters than the conventional notation. Evaluation: I investigated the effects of distinctive features by analysing the changes in decision trees (compared to the system of Thesis I.1). The results are shown in Table 2 and Table 3. The decision trees for the five states are summarized in the figure. This way each value represents 5 states 5 speakers = 25 decision trees. The header of both tables shows the parameter streams of mixed excitation. The results of Table 2 show how parameter streams were influenced by distinctive features. The biggest influence was in the case of spectral parameters, although all other streams are affected as well. Table 3. shows the ten most frequent distinctive features; articulatory-free features occur in more than 50% of the decision trees. I measured the perceptual effect of distinctive features by MOS and CMOS listening tests. The results of the CMOS test are shown in Figure 4. M1 denotes the experimental system of Thesis I.1 and M1-DF denotes the HMM-TTS with distinctive features. The figure shows that distinctive features increased the quality (M1-DF is preferred to M1). The results of the MOS test show an increment in quality as well. These results are described in the dissertation and in the related publication in detail. Table 2. The ratio of distinctive features in the decision trees of Hungarian HMM-TTS (mixed excitation). F0 Spectral Voicing Duration parameters strength Number of nodes Distinctive features Ratio 19.3% 43.1% 27.2% 22.7% 23.8% Table 3. The ten most frequent distinctive features in decision trees (articulatory-free distinctive features are bold and italic). F0 Spectral Voicing Duration parameters strength 1. sonorant back lateral sonorant 2. low sonorant sonorant continuant 3. continuant round continuant nasal 4. lateral nasal round high 5. nasal coronal voiced round 6. round low nasal consonantal 7. voiced high low lateral 8. high lateral strident voiced 9. strident continuant consonantal strident 10. back labial low low 8

10 Conclusion: distinctive features increased the speech quality of HMM-TTS and the structure of decision trees was also remarkably affected. Apart from the practical aspects (better speech quality), distinctive features bring HMM-TTS closer to the nature of speech production. Without distinctive features (M1) With distinctive features (M1-DF) 0% 25% 50% 75% 100% Figure 4. Subjective evaluation of effects of distinctive features with CMOS listening test. Thesis I.3. [J2, B1, B2a, C6, C7] I designed a supervised speaker adaptation method for hidden Markov-model based text-to-speech synthesis in Hungarian, which requires less than 10% of the speech corpus of the speaker dependent case to create new voices from the average voice model. I showed that it is possible to produce a synthetic voice with significantly better quality than in the speaker dependent case. In the current thesis I examined one of the most important features of hidden Markovmodel based speech synthesis: speaker adaptation. I created the average voice model with M2, M3, M4, M5 and F2 speech corpora considering distinctive features (Thesis I.2). Next I modified the HMMs by an MLLR (Maximum Likelihood Linear Regression) procedure according to the parameters extracted from M1 and F1 speech corpora [22]. I denote speaker dependent cases with SD and speaker adapted cases with SA. Evaluation: after speaker adaptation a male (SA-M1) and a female (SA-F1) voice were created. I compared the perceptual difference of these systems to the speaker dependent models of M1 and F1 (denoted by SD-M1 and SD-F1) with MOS and CMOS listening tests. Figure 5. shows the results of the CMOS test: speaker adapted systems were preferred. The results of the MOS test are described in the dissertation in detail. In both cases (CMOS, MOS) the quality of speaker adapted systems was significantly better than in the speaker dependent case. Speaker dependent Speaker adapted M1 F1 0% 20% 40% 60% 80% 100% Figure 5. Subjective evaluation of speaker dependent and speaker adapted HMM-TTS with CMOS listening test. 9

11 SD-M1-manual SD-M1-auto SD-F1-manual SD-F1-auto SA-M1-manual SA-M1-auto SA-F1-manual SA-F1-auto Conclusion: based on the results of this thesis minute recordings are enough for creating new HMM-TTS voice characteristics in contrast to the speaker dependent case in Thesis I.1 (2-3 hour recordings). The quality of speaker adapted voice can even be significantly better. Thesis I.4. [B1] I showed experimentally that manual correction of automatic labelling of training speech corpus may be substituted by automatic methods only, because manual correction does not always cause significant improvement in quality of synthetic speech in speaker dependent and speaker adapted HMM-TTS systems. After creating a Hungarian speaker dependent and speaker adapted HMM-TTS system my goal was to investigate the correlation between the quality of synthetic speech and the precision of the speech corpus. Manual correction of automatic labels requires deep knowledge and high precision; it is a time consuming work. Speaker dependent (SD) and speaker adapted (SA) HMM-TTS voices with automatically labelled (auto) and manually corrected (manual) male (M1) and female (F1) speech corpora were created. The average voice was built by automatically labelled speech corpora (the same as used in Thesis I.3). Phoneme error related data are shown in Table 4, segmentation error related results are shown in Table 5. Table 4 shows that only a small percentage of the database was affected by phoneme errors in the speaker dependent speech corpora (0.83%, 0.52%). In the adaptation speech corpora a higher phoneme error ratio was measured (15.5%, 6%). The header of Table 5 denotes the difference in time between the automatic and manually corrected phoneme boundaries, and the values refer to the number of corrections in the given speech corpus. Comparing the number of phonemes with Table 4 it can be concluded that about 17 to 31 percent of phoneme boundaries were manually corrected. Table 4. Features of automatic and manually corrected phonemes in speech corpora (speaker dependent and speaker adapted systems). No. of sentences Duration [minutes] Number of phonemes Correct phonemes Deletions Substitutions Insertions Number of corrections PER 0% 0.83% 0% 0.52% 0% 15.5% 0% 6% 10

12 Table 5. Precision of automatic segmentation (phoneme boundary) ms 20-29ms 30-39ms 40-49ms 50-59ms >60ms SD-M1-auto SD-F1-auto SA-M1-auto SA-F1-auto Evaluation: to determine if manual correction of the labels leads to an increment in speech quality CMOS and MOS listening tests were carried out. Figure 6 shows the results of the CMOS test. There was no significant difference between automatic labelling and manual correction of automatic labels in the case of SD-F1, SA-M1, SA-F1 voices. Manual correction caused a significant improvement in speech quality in the case of SD-M1, although results of MOS tests did not show significant difference in either case. The results of the MOS test are introduced in the dissertation in detail. Conclusion: according to the results there are cases, when it is possible to create consistently good speech quality without manual correction of automatic labels, thus a remarkable amount of work can be saved in HMM-TTS systems. In the speaker adapted cases CMOS and MOS tests did not show significant difference. This result makes it reasonable to investigate the error ratio, which still does not influence speech quality if only automatic methods are used. If generative models can produce similar quality even with higher phoneme error ratios, than automatic speech recognizer (ASR) transcription based speaker adaptation may be possible. I investigate this topic in Thesis Group 2. Automatic Manually corrected SD-M1 SD-F1 SA-M1 SA-F1 0% 20% 40% 60% 80% 100% Figure 6. Subjective evaluation of automatic labels and manually corrected automatic labels with CMOS listening test Thesis Group II. Unsupervised speaker adaptation of hidden Markov-model based text-to-speech synthesis with semi-spontaneous speech. The results of Thesis I.4 prepared the vision of completely automatic creation of new HMM-TTS voices; thus waveforms would be enough for speaker adaptation. Automatic creation of HMM-TTS voices makes sense in the case of spontaneous and semi-spontaneous 11

13 speech corpora, because planned speech usually has a phonetic transcription; consequently there is no reason for unsupervised speaker adaptation. I conducted research with semispontaneous speech. 1 Based on the results of related research I suggested a novel solution: the transcription of automatic speech recognizer (ASR) is used as the basis of the adaptation database. Phoneme boundaries are determined by forced alignment with an automatically controlled beam. 2 Thus the method can be applied to ASRs even if confidence is not available. (I investigate previous works of unsupervised speaker adaptation in my related papers and in my dissertation in detail.) First I developed a segmentation and selection algorithm suitable for semi-spontaneous speech. The goal of segmentation is to determine virtual sentences of semi-spontaneous speech. The goal of selection is to select an optimal subset of adaptation data. I performed subjective evaluation with different automatically created adaptation databases. These databases had 0%, 17%, 21%, 42%, 52%, 55%, 68%, 70%, 88% and 89% of phoneme error rate (PER), respectively. In practice it is likely that high quality recordings are not available and recognition accuracy varies. Consequently it is beneficial to test the solution with wide restrictions. The procedure which is described in Thesis Group II contains language specific components, although the applied methodology is language independent. Thesis II.1. [C1, C5, C6] I designed and implemented an unsupervised procedure for speaker adaptation with semi-spontaneous speech based on the transcription of an automatic speech recognizer. I showed that it is possible to create not significantly different quality with the proposed method from supervised speaker adaptation. I created a method for the segmentation of semi-spontaneous speech and I had it recognized with a Hungarian ASR system. Furthermore I determined the phoneme boundaries with forced alignment. The ASR gave word level output, so forced alignment had to be performed in a separate step. From the resulting speech corpus I dismissed with an automatic method the utterances that are not favourable for HMM-TTS. I selected 10 minutes of the speech corpus randomly, and I created the manual transcription of these 10 minutes for reference. The average voice model was trained with the same corpora that were introduced in Thesis Group I. In the first phase I created HMM-TTS voices with semi-spontaneous speech from four speakers. The PER varied between 10 42% (see Table 6). In the second phase I made further experiments with a male speaker s speech corpus (M8); the PER in this case was between 17 89%. The features of these corpora are described in the dissertation and related publications in detail. Evaluation: for subjective evaluation CMOS and MOS listening tests were carried out. According to the results the quality of synthetic speech increases as PER decreases. When 1 Semi-spontaneous (or semi-reproductive) is speech that has the features of live speech, although the speaker has previously planned it, usually in written form. 2 Beam is a parameter of forced alignment. 12

14 PER was lower than 55% there was no significant difference between unsupervised and supervised cases. The results of CMOS tests show (Figure 7.) that supervised and unsupervised systems were considered similar in M8-RND and M9-RND cases, and even in M6-RND and M7-RND cases the difference was not significant. In case of higher PER significant difference was measured. I investigate higher PERs in Thesis II.2. Conclusion: the results are quite surprising, because they state that the ASR transcription based speaker adaptation is not significantly different from the supervised case. This is the extension of Thesis I.4, because not only the phonetic transcription and segmentation is done automatically, but the textual transcription and utterance selection is determined automatically as well. Table 6. Semi-spontaneous adaptation speech corpora for unsupervised speaker adaptation. Symbol Speaker Method Selection Duration PER 3 WER 4 M6-S-RND Male 6. Supervised Random 11.4 min error free M6-U-RND Male 6. Unsupervised Random 11.4 min 42% 87% M7-S-RND Male 7. Supervised Random 9.6 min error free M7-U-RND Male 7. Unsupervised Random 9.6 min 21% 74% M8-S-RND Male 8. Supervised Random 10.2 min error free M8-U-RND Male 8. Unsupervised Random 10.2 min 17% 57% M9-S-RND Male 9. Supervised Random 9.7 min error free M9-U-RND Male 9. Unsupervised Random 9.7 min 10% 44% Unsupervised Supervised M6-RND M7-RND M8-RND M9-RND 0% 25% 50% 75% 100% Figure 7. Quality evaluation of unsupervised speaker adaptation with semi-spontaneous speeches under 50% PER by CMOS pair comparison. Thesis II.2. [C1, C3, C5] I designed and implemented an unsupervised method to select a favourable subset of a speech corpus for speaker adaptation, and I showed that it is possible to create better synthetic speech quality with the proposed method than with random selection of the adaptation speech corpus. The method, which was described in Thesis II.1, resulted in not significantly worse synthetic speech quality in an automatic way. The PER of the adaptation corpus was smaller 3 PER: Phoneme Error Rate 4 WER: Word Error Rate 13

15 than 50% in Thesis II.1. In the current thesis I investigate how the method can be enhanced where PER of the speaker adaptation corpus is larger than 50%. Based on the results of previous research (which are introduced in the dissertation), of Thesis I.4 and II.1 I designed and implemented the following method: segmentation, automatic speech recognition and phoneme boundary detection was performed as in Thesis II.1, although the selection method was modified. The optimal value of the beam is defined by the quality of waveform and the errors in ASR transcription. Furthermore my goal was to select about 10 minutes of adaptation data from any speech corpus. In unsupervised speaker adaptation varying quality of utterances are probable, so the quality of ASR transcription is not predictable, thus in empirical terms an exact beam value cannot be determined. Therefore the width of the beam is set in an iterative way to find its optimal value, when the length of successfully forced aligned wave files is closest to 10 minutes (t_limit). Each virtual sentence of semi-spontaneous speech is represented by one wave file. I run forced alignment on these files with the set beam width, and I investigate the length of the successfully forced aligned wave files (denoted by t_adaptation_corpus). I search for the optimal beam width with bisection method. The core of the method may be written in pseudocode as follows: 1. i=0 2. beam_max=beam[0]=maximum beam width 3. beam_min=0 4. t_limit=10 minutes 5. DO 6. CALL forced alignment WITH beam[i] on each wave file RETURNING t_adaptation_corpus[i] 7. IF t_adaptation_corpus[i]>t_limit THEN 8. beam_max=beam[i] 9. beam[i+1]=beam[i]-floor((beam[i]-beam_min)/2) 10. ELSE 11. beam_min=beam[i] 12. beam[i+1]=beam[i]+floor((beam_max-beam[i])/2) 13. END IF 14. i WHILE beam[i]!= beam[i-1] The method stops, when the beam value is the same in two consecutive steps. Next fullcontext labelling of phonetic transcription is done and the result is a speaker adaptation corpus. At the beginning of the research the quality of ASR was high due to high-quality, domain specific recordings. In order to investigate the deeper effects of phoneme errors I simulated worse recognition results with 0-gram language models 5 and additive noise. With these settings the method can be practically tested, because varying quality and domain-free utterances are likely in general. The adaptation corpora with higher than 50% PER are summarized in Table 7. All of these corpora were generated in an unsupervised way (denoted by U), and the method, which was introduced above, is denoted by BBS (Beam-based selection). To be able to measure the effectiveness of the BBS method, I also created adaptation speech corpora with the random selection method, and these are denoted by RND (random). 0G means 0-gram 5 0-gram means each morpheme occurs once, with the same probability in the language model. 14

16 language model, NOISE and NOISE2 mean -50 db and -25 db additive white noise compared to the maximum level. The maximum level of the original recordings was normalized to 0 db per sentence. With the speech corpora in Table 7 I created speaker adapted HMM-TTS voices. Evaluation: a CMOS and a MOS listening test were carried out to determine the efficiency of the beam-based selection method. The naturalness and the similarity to the target speaker were measured by MOS tests, the preference score by CMOS tests. The results of the CMOS tests are shown in Figure 8. In the case of higher phoneme error rates (M8-U-0G-NOISE, M8-U-0G-NOISE2) the proposed method (BBS) resulted in significantly better speech quality than the random selection method. The results of the MOS tests (which are described in the dissertation) show significant difference in the case of NOISE2. Conclusion: in the listening tests the proposed method gave significantly better results, even in the case of high phoneme error rates (i.e. bad recognition results with additive noise), than with random selection of adaptation data. Table 7. Semi-spontaneous speech corpora with simulated bad recognition results for unsupervised speaker adaptation. Symbol Speaker Language model Noise Duration PER WER M8-U-0G-RND Male 8. 0-gram min 55% 100% M8-U-0G-BBS Male 8. 0-gram - 10 min 52% 100% M8-U-0G-RND-NOISE Male 8. 0-gram -50 db 8.9 min 70% 100% M8-U-0G-BBS-NOISE Male 8. 0-gram -50 db 9.4 min 68% 100% M8-U-0G-RND-NOISE2 Male 8. 0-gram -25 db 9.7 min 89% 100% M8-U-0G-BBS-NOISE2 Male 8. 0-gram -25 db 10.2 min 88% 100% RND BBS M8-U-0G-NOISE2 M8-U-0G-NOISE M8-U-0G 0% 25% 50% 75% 100% Figure 8. Subjective evaluation of RND and BBS methods with CMOS listening tests Thesis Group III. Optimizing Hidden Markov-model based text-to-speech synthesis for low-resource devices. HMM-TTS speech generation runs faster than real-time on modern desktop computers. On low-resource devices, i.e. smartphones, the calculations must still be optimized to achieve low response times with near real-time functionality. Optimizing a speech synthesis system on mobile devices is a challenging task because both the storage capacity and the computing power are limited. The latest high-end mobile devices possess large storage size 15

17 and high performance CPU. Speech synthesis still needs to compete with other applications for precious storage space and the computing power is also shared among system and third party processes. A further disadvantage of resource demanding computations is that they cause higher power consumption and shorter battery life. My research includes the introduction of codebook based noise excitation; investigation of the relationship between line spectral pairs, parameter streams and perceived quality; furthermore, the parallelization of parameter generation, vocoding algorithm and waveform playback. In this thesis group the most computational power demanding parts of synthesis are determined and I design and implement methods for decreasing the computation and response times. I investigate the synthetic speech quality after each incremental step with listening tests. I measure the required time of loading the database, parameter generation algorithm and vocoder algorithm until the response of the system. In the following I refer to these three stages as (1), (2) and (3). I carried out the measurements on three different smartphones, which are shown in Table 8. In this thesis group I carried out the research with an English speech corpus (CMU ARCTIC / SLT) [13]. Table 8. The devices used in the experiment of optimizing HMM-TTS. Device CPU type Speed [MHz] Mob1 (iphone) Samsung ARM Mob2 (Spica) Samsung S3C Mob3 (Desire) Qualcomm QSD Thesis III.1. [J1, C4] I designed a low-resource model for hidden Markov-model based speech synthesis and I showed experimentally that without significant loss in quality the proposed model significantly improves the performance. The excitation of unvoiced sounds is modelled with Gaussian noise in the case of impulse-noise excitation based vocoders. The Box-Muller procedure generates Gaussian noise [23]. This method is widely used in HMM-TTS systems as well. Codebook based Gaussian noise generation which uses only integer operations achieved a significant increment in performance (~ten times faster) compared to floating-point arithmetic [24]. Codebook based noise generation and integer operations mean loss in precision compared to the Box-Muller procedure, although significant loss in perceived quality on mobile phones was not expected. Thus I introduced this method on low-resource devices. Next I modified the modelling of spectral parameters. Generally HMM-TTS systems use MGC (Mel-Generalized Cepstrum) and MGC-LSP (Mel-Generalized Cepstrum-Line Spectral Pairs) [25]. MGC is the generalized logarithm of the spectrum modified by the perception based Mel-scale. The spectral filtering with MGC and MGC-LSP parameters is usually completed with MLSA (Mel Log Spectrum Approximation) filters. The ideal transfer characteristics of MLSA filters cannot be realized, thus it is approximated by a 20 th order Padé approximation in practice. So the complexity of speech synthesis depends on the order of spectral analysis and the order of Padé approximation. If we change MGC and MGC-LSP spectral modelling to LSP, then the spectral filtering can be performed by LPC; 16

18 Mob1 Mob2 Mob3 Mob1 Mob2 Mob3 Mob1 Mob2 Mob3 Mob1 Mob2 Mob3 Mob1 Mob2 Mob3 Mob1 Mob2 Mob3 Mob1 Mob2 Mob3 Mob1 Mob2 Mob3 Mob1 Mob2 Mob3 Response time [s] thus the complexity of the system will depend only on the order of spectral analysis. Further enhancement in speed can be achieved by reducing the order of LSP analysis, although it affects the quality of synthetic speech as well. I created HMM-TTS systems with 24th, 22nd, 20th, 18th, 14th, 12th and 10th order LSP analysis. (In the case of 24th, 22nd and 20th order filters the listening test were not carried out with all the test subjects, because preliminary listening tests by speech experts did not show significant differences to the 18th order LSP case.) The depth of decision trees influences both computational cost and footprint size. Fewer leaves and leaf nodes decrease computational cost and footprint size. Smaller decision trees result in degradation in speech quality, because larger sets of parameters are clustered in leaves. The number of leaves in decision trees used during further optimization is shown in Table 9. Evaluation: all these steps causes loss in speech quality, consequently it is important to investigate, if this loss is significant. The calculation time measurements were performed incrementally, and a listening test was carried out with all the resulting systems. The results of calculation time measurements are shown in Figure 9 and the results of listening tests are shown in Figure 10. Figure 9 shows (1), (2) and (3) parts of the calculation time measurements in one column so the response time of the system can be easily seen (the sum of the three parts). Table 9. Decision trees with different sizes used for optimization. Symbol Number of leaves in decision tree LSP LogF 0 Duration Size [KByte] Baseline # # # (3) (2) (1) Baseline Codebook 18th order based noise LSP 14th order LSP 12th order LSP 10th order LSP 12th order LSP #1 12th order LSP #2 12th order LSP #3 Figure 9. Calculation time measurements of HMM-TTS system on low-resource devices. 17

19 Baseline Codebook 18th order based noise LSP 14th order LSP 12th order LSP 10th order LSP 12th order LSP #1 12th order LSP #2 12th order LSP #3 Figure 10. Subjective evaluation of Thesis III.1 optimization steps with MOS listening tests. Conclusion: in the case of 12th order LSP with codebook based noise generation, and about 30% reduction of the size of the decision tree there was no significant loss in quality, while the calculation times became about five times faster. In the other cases there was either less improvement in performance or quality decreased significantly. Thesis III.2. [J1, C4] I designed and implemented a parallel method for resource demanding processes of HMM-TTS synthesis (parameter generation, vocoder algorithm) taking into account the actual load, and I showed experimentally that without loss in quality the proposed method significantly improves the response time. After the optimization steps of Thesis III.1 I designed a method to reduce the response time of HMM-TTS system. This method does not affect the quality of the synthetic speech. I extended the time-recursive algorithm of parameter generation as follows [26]: the vocoder algorithm and waveform playback is done in segments, and segment size is set according to the performance and actual load of the system. In general in text-to-speech engines waveform playback is not realized in order to remain platform independent. Introducing waveform playback the response time of the system can be reduced by the parallelization of parameter generation, vocoder algorithm and waveform playback, although platform specific steps must be taken. The parallelization can be realized in the following way (I define segment as a parameter stream of k frames): 1. The time-recursive parameter generation algorithm is calculated for the given segment (k frames), and the parameter stream is passed to the vocoder algorithm. The computation continues with the next segment. 2. The vocoder algorithm generates waveform from the parameter stream of the segment. 3. The segment s waveform added to the playback queue. 18

20 I determine the segment s length in runtime on the analogy of audio playback in computer networks. Ramjee et al. designed a network audio playback method [27], which I will introduce briefly in the following. Let n i be the delay of the i-th audio packet. Delay estimate (d i ) and variation (v i ) of every incoming packet is calculated in the following way: ( ) (3) ( ) (4) Equations (3) and (4) are calculated for each packet, but they are used after pauses only. The time at which packet i is played out at the receiving host after pauses is given by: (5) The A constant in equations (3) and (4) determines the memory of the approximation, the delay / packet loss ratio is defined by the B constant in equation (5). In practice A= and B=4 values are often used. I tailored this method to the speech synthesis in HMM-TTS systems. Let s denote the time required for parameter generation and vocoder algorithm of the i-th segment with n i. The values of d i, v i and p i are calculated according to equation (3)-(5) and the initial values are d 1 =n 1, k 1 =30, v 0 =0 and the constants are A=0.99, B=4 (i>0). The number of frames in the i+1-th segment (k i+1 ) is calculated after every 60 frames (T frame is the length of the frame, 25 ms respectively in the experimental system): (6) The schematic block diagram of the method can be seen in Figure 11. and it is described in the dissertation in detail. Evaluation: the method, which is described above does not influence the speech quality; consequently listening tests were not necessary. The values of the calculation time measurements are shown in Figure 12. Conclusion: the results of Thesis III.2 show about five times improvement in response time compared to the system of Thesis III.1. Compared to the baseline system the response time is about twenty times faster according to the results of Thesis Group III. 19

21 Mob1 Mob2 Mob3 Mob1 Mob2 Mob3 Response time [s] Contextdependent labels Time-recursive parameter generation no LSP, F0 parameters 60 frames? yes p i, k i+1 calculations Vocoder d i, v i calculations i-th waveform (i-1)-th waveform (i-2)-th waveform... (i-n)-th waveform Size n playback queue Figure 11. Schematic block diagram of parallelization of parameter generation, vocoder algorithm and waveform playback in HMM-TTS systems based on the actual load (3) (2) (1) 12th order LSP #1 Parallelization Figure 12. Enhancement of response times after parallelization. 20

22 6. Practical application of scientific results Besides the theoretical outcomes of the thesis groups, as introduced in this booklet, their practical application is also an important factor. A high-quality, domain free Hungarian text-to-speech engine is created based on the results of theses I.1. and I.2. The general application of this TTS engine makes several speech enabled systems possible, e.g. screen readers for blind users, interactive voice response systems, prompt generators and additional systems with speech user interface. The results of Thesis I.2 could be extended to other languages. Based on the results of Thesis I minutes of utterances are enough to create new voice characteristics. Thesis I.4 suggests that manual correction of the adaptation database is not absolutely necessary. The novel outcome of Thesis Group II is the possibility of creating new HMM-TTS voices without manual work. With the unsupervised adaptation method, which is introduced in this thesis group, it is possible to create thousands of voices automatically, e.g., from a speech corpus of telephone conversations. The results also make it possible to tailor the voice of a particular system to a target speaker s voice characteristics. Thus a smartphone can learn the voice characteristics of its owner automatically. The results of Thesis II.1 give a method for unsupervised speaker adaptation in the case of better recognition results, while Thesis II.2 also makes it possible to extend the method to worse recognition results. These methods were tested with Hungarian speech corpora, although the method does not contain any language specific parts. The systems from Thesis Group I and II could be realized on low-resource devices, i.e. smartphones, based on the results of Thesis Group III. The experimental system of this thesis group takes into account the performance and actual load of the system during synthesis on Google Android 2.x and 4.x smartphones. The application programming interface (API) level realization of the HMM-TTS makes a wider usage possible. The resulting system can be used in any speech enabled application on the device, including message readers (SMS, , social network messages, etc.), e-book readers, screen readers, navigation systems. The research of Thesis Group III was carried out with English HMM-TTS, although the solution does not contain any language specific element: it can be applied to other languages as well. Furthermore the voice characteristics of the mobile HMM-TTS system can be modified based on the results of Thesis Groups I and II. The results of all thesis groups were implemented in experimental systems. 21

A study of speaker adaptation for DNN-based speech synthesis

A study of speaker adaptation for DNN-based speech synthesis A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 Ranniery Maia 1,2, Jinfu Ni 1,2, Shinsuke Sakai 1,2, Tomoki Toda 1,3, Keiichi Tokuda 1,4 Tohru Shimizu 1,2, Satoshi Nakamura 1,2 1 National

More information

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers October 31, 2003 Amit Juneja Department of Electrical and Computer Engineering University of Maryland, College Park,

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders

More information

Edinburgh Research Explorer

Edinburgh Research Explorer Edinburgh Research Explorer Personalising speech-to-speech translation Citation for published version: Dines, J, Liang, H, Saheer, L, Gibson, M, Byrne, W, Oura, K, Tokuda, K, Yamagishi, J, King, S, Wester,

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Hua Zhang, Yun Tang, Wenju Liu and Bo Xu National Laboratory of Pattern Recognition Institute of Automation, Chinese

More information

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,

More information

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology ISCA Archive SUBJECTIVE EVALUATION FOR HMM-BASED SPEECH-TO-LIP MOVEMENT SYNTHESIS Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano Graduate School of Information Science, Nara Institute of Science & Technology

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition Seltzer, M.L.; Raj, B.; Stern, R.M. TR2004-088 December 2004 Abstract

More information

GACE Computer Science Assessment Test at a Glance

GACE Computer Science Assessment Test at a Glance GACE Computer Science Assessment Test at a Glance Updated May 2017 See the GACE Computer Science Assessment Study Companion for practice questions and preparation resources. Assessment Name Computer Science

More information

WHEN THERE IS A mismatch between the acoustic

WHEN THERE IS A mismatch between the acoustic 808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,

More information

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Neural Network GUI Tested on Text-To-Phoneme Mapping A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis

More information

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language Z.HACHKAR 1,3, A. FARCHI 2, B.MOUNIR 1, J. EL ABBADI 3 1 Ecole Supérieure de Technologie, Safi, Morocco. zhachkar2000@yahoo.fr.

More information

Radius STEM Readiness TM

Radius STEM Readiness TM Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing

Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing Pallavi Baljekar, Sunayana Sitaram, Prasanna Kumar Muthukumar, and Alan W Black Carnegie Mellon University,

More information

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence INTERSPEECH September,, San Francisco, USA Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence Bidisha Sharma and S. R. Mahadeva Prasanna Department of Electronics

More information

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Prof. Ch.Srinivasa Kumar Prof. and Head of department. Electronics and communication Nalanda Institute

More information

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One

More information

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH Don McAllaster, Larry Gillick, Francesco Scattone, Mike Newman Dragon Systems, Inc. 320 Nevada Street Newton, MA 02160

More information

Statewide Framework Document for:

Statewide Framework Document for: Statewide Framework Document for: 270301 Standards may be added to this document prior to submission, but may not be removed from the framework to meet state credit equivalency requirements. Performance

More information

Statistical Parametric Speech Synthesis

Statistical Parametric Speech Synthesis Statistical Parametric Speech Synthesis Heiga Zen a,b,, Keiichi Tokuda a, Alan W. Black c a Department of Computer Science and Engineering, Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya,

More information

Mandarin Lexical Tone Recognition: The Gating Paradigm

Mandarin Lexical Tone Recognition: The Gating Paradigm Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING Gábor Gosztolya 1, Tamás Grósz 1, László Tóth 1, David Imseng 2 1 MTA-SZTE Research Group on Artificial

More information

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS Heiga Zen, Haşim Sak Google fheigazen,hasimg@google.com ABSTRACT Long short-term

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Letter-based speech synthesis

Letter-based speech synthesis Letter-based speech synthesis Oliver Watts, Junichi Yamagishi, Simon King Centre for Speech Technology Research, University of Edinburgh, UK O.S.Watts@sms.ed.ac.uk jyamagis@inf.ed.ac.uk Simon.King@ed.ac.uk

More information

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION Mitchell McLaren 1, Yun Lei 1, Luciana Ferrer 2 1 Speech Technology and Research Laboratory, SRI International, California, USA 2 Departamento

More information

Modeling user preferences and norms in context-aware systems

Modeling user preferences and norms in context-aware systems Modeling user preferences and norms in context-aware systems Jonas Nilsson, Cecilia Lindmark Jonas Nilsson, Cecilia Lindmark VT 2016 Bachelor's thesis for Computer Science, 15 hp Supervisor: Juan Carlos

More information

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode Diploma Thesis of Michael Heck At the Department of Informatics Karlsruhe Institute of Technology

More information

Speaker recognition using universal background model on YOHO database

Speaker recognition using universal background model on YOHO database Aalborg University Master Thesis project Speaker recognition using universal background model on YOHO database Author: Alexandre Majetniak Supervisor: Zheng-Hua Tan May 31, 2011 The Faculties of Engineering,

More information

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Reinforcement Learning by Comparing Immediate Reward

Reinforcement Learning by Comparing Immediate Reward Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate

More information

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion

More information

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1 Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech Dept. for Speech, Music and Hearing Quarterly Progress and Status Report VCV-sequencies in a preliminary text-to-speech system for female speech Karlsson, I. and Neovius, L. journal: STL-QPSR volume: 35

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

Voice conversion through vector quantization

Voice conversion through vector quantization J. Acoust. Soc. Jpn.(E)11, 2 (1990) Voice conversion through vector quantization Masanobu Abe, Satoshi Nakamura, Kiyohiro Shikano, and Hisao Kuwabara A TR Interpreting Telephony Research Laboratories,

More information

Detecting English-French Cognates Using Orthographic Edit Distance

Detecting English-French Cognates Using Orthographic Edit Distance Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

An Online Handwriting Recognition System For Turkish

An Online Handwriting Recognition System For Turkish An Online Handwriting Recognition System For Turkish Esra Vural, Hakan Erdogan, Kemal Oflazer, Berrin Yanikoglu Sabanci University, Tuzla, Istanbul, Turkey 34956 ABSTRACT Despite recent developments in

More information

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report

Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report Linking the Common European Framework of Reference and the Michigan English Language Assessment Battery Technical Report Contact Information All correspondence and mailings should be addressed to: CaMLA

More information

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

More information

Speaker Identification by Comparison of Smart Methods. Abstract

Speaker Identification by Comparison of Smart Methods. Abstract Journal of mathematics and computer science 10 (2014), 61-71 Speaker Identification by Comparison of Smart Methods Ali Mahdavi Meimand Amin Asadi Majid Mohamadi Department of Electrical Department of Computer

More information

SARDNET: A Self-Organizing Feature Map for Sequences

SARDNET: A Self-Organizing Feature Map for Sequences SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu

More information

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Alistair Conkie AT&T abs - Research 180 Park Avenue, Florham Park,

More information

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION Han Shu, I. Lee Hetherington, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge,

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

On-the-Fly Customization of Automated Essay Scoring

On-the-Fly Customization of Automated Essay Scoring Research Report On-the-Fly Customization of Automated Essay Scoring Yigal Attali Research & Development December 2007 RR-07-42 On-the-Fly Customization of Automated Essay Scoring Yigal Attali ETS, Princeton,

More information

School of Innovative Technologies and Engineering

School of Innovative Technologies and Engineering School of Innovative Technologies and Engineering Department of Applied Mathematical Sciences Proficiency Course in MATLAB COURSE DOCUMENT VERSION 1.0 PCMv1.0 July 2012 University of Technology, Mauritius

More information

BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY

BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY Sergey Levine Principal Adviser: Vladlen Koltun Secondary Adviser:

More information

On the Formation of Phoneme Categories in DNN Acoustic Models

On the Formation of Phoneme Categories in DNN Acoustic Models On the Formation of Phoneme Categories in DNN Acoustic Models Tasha Nagamine Department of Electrical Engineering, Columbia University T. Nagamine Motivation Large performance gap between humans and state-

More information

Body-Conducted Speech Recognition and its Application to Speech Support System

Body-Conducted Speech Recognition and its Application to Speech Support System Body-Conducted Speech Recognition and its Application to Speech Support System 4 Shunsuke Ishimitsu Hiroshima City University Japan 1. Introduction In recent years, speech recognition systems have been

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University 1 Perceived speech rate: the effects of articulation rate and speaking style in spontaneous speech Jacques Koreman Saarland University Institute of Phonetics P.O. Box 151150 D-66041 Saarbrücken Germany

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

Using GIFT to Support an Empirical Study on the Impact of the Self-Reference Effect on Learning

Using GIFT to Support an Empirical Study on the Impact of the Self-Reference Effect on Learning 80 Using GIFT to Support an Empirical Study on the Impact of the Self-Reference Effect on Learning Anne M. Sinatra, Ph.D. Army Research Laboratory/Oak Ridge Associated Universities anne.m.sinatra.ctr@us.army.mil

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

Segregation of Unvoiced Speech from Nonspeech Interference

Segregation of Unvoiced Speech from Nonspeech Interference Technical Report OSU-CISRC-8/7-TR63 Department of Computer Science and Engineering The Ohio State University Columbus, OH 4321-1277 FTP site: ftp.cse.ohio-state.edu Login: anonymous Directory: pub/tech-report/27

More information

Physics 270: Experimental Physics

Physics 270: Experimental Physics 2017 edition Lab Manual Physics 270 3 Physics 270: Experimental Physics Lecture: Lab: Instructor: Office: Email: Tuesdays, 2 3:50 PM Thursdays, 2 4:50 PM Dr. Uttam Manna 313C Moulton Hall umanna@ilstu.edu

More information

12- A whirlwind tour of statistics

12- A whirlwind tour of statistics CyLab HT 05-436 / 05-836 / 08-534 / 08-734 / 19-534 / 19-734 Usable Privacy and Security TP :// C DU February 22, 2016 y & Secu rivac rity P le ratory bo La Lujo Bauer, Nicolas Christin, and Abby Marsh

More information

Evolutive Neural Net Fuzzy Filtering: Basic Description

Evolutive Neural Net Fuzzy Filtering: Basic Description Journal of Intelligent Learning Systems and Applications, 2010, 2: 12-18 doi:10.4236/jilsa.2010.21002 Published Online February 2010 (http://www.scirp.org/journal/jilsa) Evolutive Neural Net Fuzzy Filtering:

More information

Progress Monitoring for Behavior: Data Collection Methods & Procedures

Progress Monitoring for Behavior: Data Collection Methods & Procedures Progress Monitoring for Behavior: Data Collection Methods & Procedures This event is being funded with State and/or Federal funds and is being provided for employees of school districts, employees of the

More information

A Hybrid Text-To-Speech system for Afrikaans

A Hybrid Text-To-Speech system for Afrikaans A Hybrid Text-To-Speech system for Afrikaans Francois Rousseau and Daniel Mashao Department of Electrical Engineering, University of Cape Town, Rondebosch, Cape Town, South Africa, frousseau@crg.ee.uct.ac.za,

More information

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Navdeep Jaitly 1, Vincent Vanhoucke 2, Geoffrey Hinton 1,2 1 University of Toronto 2 Google Inc. ndjaitly@cs.toronto.edu,

More information

The Strong Minimalist Thesis and Bounded Optimality

The Strong Minimalist Thesis and Bounded Optimality The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this

More information

Circuit Simulators: A Revolutionary E-Learning Platform

Circuit Simulators: A Revolutionary E-Learning Platform Circuit Simulators: A Revolutionary E-Learning Platform Mahi Itagi Padre Conceicao College of Engineering, Verna, Goa, India. itagimahi@gmail.com Akhil Deshpande Gogte Institute of Technology, Udyambag,

More information

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

ECE-492 SENIOR ADVANCED DESIGN PROJECT

ECE-492 SENIOR ADVANCED DESIGN PROJECT ECE-492 SENIOR ADVANCED DESIGN PROJECT Meeting #3 1 ECE-492 Meeting#3 Q1: Who is not on a team? Q2: Which students/teams still did not select a topic? 2 ENGINEERING DESIGN You have studied a great deal

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

Automatic Pronunciation Checker

Automatic Pronunciation Checker Institut für Technische Informatik und Kommunikationsnetze Eidgenössische Technische Hochschule Zürich Swiss Federal Institute of Technology Zurich Ecole polytechnique fédérale de Zurich Politecnico federale

More information

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project Phonetic- and Speaker-Discriminant Features for Speaker Recognition by Lara Stoll Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California

More information

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC On Human Computer Interaction, HCI Dr. Saif al Zahir Electrical and Computer Engineering Department UBC Human Computer Interaction HCI HCI is the study of people, computer technology, and the ways these

More information

Unit Selection Synthesis Using Long Non-Uniform Units and Phonemic Identity Matching

Unit Selection Synthesis Using Long Non-Uniform Units and Phonemic Identity Matching Unit Selection Synthesis Using Long Non-Uniform Units and Phonemic Identity Matching Lukas Latacz, Yuk On Kong, Werner Verhelst Department of Electronics and Informatics (ETRO) Vrie Universiteit Brussel

More information

Major Milestones, Team Activities, and Individual Deliverables

Major Milestones, Team Activities, and Individual Deliverables Major Milestones, Team Activities, and Individual Deliverables Milestone #1: Team Semester Proposal Your team should write a proposal that describes project objectives, existing relevant technology, engineering

More information

English Language and Applied Linguistics. Module Descriptions 2017/18

English Language and Applied Linguistics. Module Descriptions 2017/18 English Language and Applied Linguistics Module Descriptions 2017/18 Level I (i.e. 2 nd Yr.) Modules Please be aware that all modules are subject to availability. If you have any questions about the modules,

More information

Grade 6: Correlated to AGS Basic Math Skills

Grade 6: Correlated to AGS Basic Math Skills Grade 6: Correlated to AGS Basic Math Skills Grade 6: Standard 1 Number Sense Students compare and order positive and negative integers, decimals, fractions, and mixed numbers. They find multiples and

More information

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic

More information

Ohio s Learning Standards-Clear Learning Targets

Ohio s Learning Standards-Clear Learning Targets Ohio s Learning Standards-Clear Learning Targets Math Grade 1 Use addition and subtraction within 20 to solve word problems involving situations of 1.OA.1 adding to, taking from, putting together, taking

More information

BENCHMARK TREND COMPARISON REPORT:

BENCHMARK TREND COMPARISON REPORT: National Survey of Student Engagement (NSSE) BENCHMARK TREND COMPARISON REPORT: CARNEGIE PEER INSTITUTIONS, 2003-2011 PREPARED BY: ANGEL A. SANCHEZ, DIRECTOR KELLI PAYNE, ADMINISTRATIVE ANALYST/ SPECIALIST

More information

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM BY NIRAYO HAILU GEBREEGZIABHER A THESIS SUBMITED TO THE SCHOOL OF GRADUATE STUDIES OF ADDIS ABABA UNIVERSITY

More information

Computerized Adaptive Psychological Testing A Personalisation Perspective

Computerized Adaptive Psychological Testing A Personalisation Perspective Psychology and the internet: An European Perspective Computerized Adaptive Psychological Testing A Personalisation Perspective Mykola Pechenizkiy mpechen@cc.jyu.fi Introduction Mixed Model of IRT and ES

More information