PIBTD: Scheme IV 100. FRR curves thresholds

Size: px
Start display at page:

Download "PIBTD: Scheme IV 100. FRR curves thresholds"

Transcription

1 Determination of A Priori Decision Thresholds for Phrase-Prompted Speaker Verication M. W. Mak, W. D. Zhang, and M. X. He Centre for Multimedia Signal Processing, Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong Abstract Speaker verication systems are often compared based on an equal error rate (equal chance of false acceptance and false rejection) obtained by adjusting a decision threshold during verication. However, the threshold should be found before verication because the identity of a claimant is actually unknown in real-world situations. This paper presents a novel method to determine the decision thresholds of speaker verication systems using enrollment data only. In the method, a speaker model is trained to differentiate the voice of the corresponding speaker and that of a general population. This is accomplished by using the speaker's utterances and those of some other speakers (denoted as anti-speakers) as the training set. Then, an operation environment is simulated by presenting the utterances of some pseudo-impostors (none of them is an anti-speaker) to the speaker model. The threshold is adjusted until the chance of falsely accepting a pseudo-impostor falls below an application dependent level. Experimental evaluations based on 138 speakers of the YOHO corpus suggest that with a simulated operation environment, it is able to determine the best compromise between false acceptance and false rejection. Keywords Speaker verication; threshold determination; elliptical basis function networks. I. Introduction The determination of decision thresholds is a very important problem in speaker verication. A large threshold could make the system annoying to users, while a small one could result in a vulnerable system. Conventional threshold determination methods [1], [2] typically compute the distribution of inter- and intra-speaker distortions, and then chose a threshold to equalize the overlapping area of the distributions, i.e. to equalize the false acceptance rate () and false rejection rate (). The success of this approach, however, relies on whether the estimated distributions match the speaker- and impostor-class distributions. Another approach derives the threshold of a speaker solely from his/her own voice and speaker model [3]. Session-to-session speaker variability, however, contributes much bias to the threshold, rendering the verication system unusable. Due to the diculty in determining a reliable threshold, researchers often report the equal error rate (ERR) of verication systems based on the assumption that an a posteriori threshold can be optimally adjusted during veri- cation. A real-world application, however, is only realistic with a priori thresholds which should be determined before verication. This project was supported by the H.K. Polytechnic University Grant No A42. M.X. He is with the Ocean Remote Sensing Institute, Ocean University of Qingdao, China. In recent years, research eort has focused on the normalization of speaker scores to minimize error rates. This includes the likelihood ratio scoring proposed by Higgins et al. [4], where verication decisions are based on the ratio of the likelihood that the observed speech is uttered by the true speaker to the likelihood that it is spoken by an imposter. The a priori threshold is then set to 1., with the claimant being accepted (reject) if the ratio is greater (less) than 1.. Subsequent work based on likelihood normalization [5], [6], cohort normalized scoring [7], and minimum verication error training [8] also shows that including an impostor model during verication not only improves speaker separability, but also allows decision thresholds to be easily set. Although these approaches help to select an appropriate threshold, they may cause the system to favor rejecting true speakers, resulting in a high. For example, Higgins et al. [4] reported that the is more than 1 times larger than the. A recent report [9] based on a similar normalization technique but dierent threshold setting procedure also found that the average of and is about 3 to 5 times larger than the ERR, suggesting that the ERR could be an over optimistic estimate of the true system performance. This paper proposes an a priori threshold determination method to address the above problem. The method is different from that of Higgins et al. in that rather than using a ratio speaker set formed by pooling the nearest reference speakers, we used two speaker sets, namely anti-speaker set and pseudo-impostor set, to determine the threshold. For each speaker, a speaker model is trained to dierentiate the voices of the speaker and the anti-speakers. Then, the pseudo-impostor set is used to determine the threshold. To enhance the capability of the speaker models without increasing the enrollment time, we sample the utterances of 45 anti-speakers and 45 pseudo-impostors to form a training set for building the speaker models and for determining the thresholds. Therefore, an operation environment for the speaker model is eectively simulated. Experimental results show that the simulated operation environment enables the verication performance to be accurately predicted during enrollment time, thereby providing a reliable means of determining the decision thresholds. This paper is organized as follows. Section II outlines the speaker models and the verication procedure. The a priori threshold determination methods are explained in Section III, and their performance are compared in Section IV. The proposed speaker models and threshold determination

2 methods are compared with that of Higgins et al. [4] in Section V. Finally, we conclude our discussions in Section VI. II. Speaker Verification A. Speaker Models: EBF Networks Elliptical basis function (EBF) networks have been used as speaker models in this work [1]. EBF networks can be applied to speaker verication as follows. Each registered speaker is assigned an EBF network with two outputs. The rst output is trained to output a `1' for the speaker's speech and a `' for other speakers' utterances, and vice versa for the second output. Therefore, two sets of data are required for constructing a speaker model, one of them being derived from the speaker and another from other speakers. We denote the second set of data as the anti-speaker set hereafter. Of particular interest is that the EBF networks incorporate the idea of likelihood ratio scoring in their discriminative training procedure. An EBF network does not require a set of cohort or background speakers during verication; rather, it embeds the characteristics of the background speakers in its parameter estimation procedure during enrollment. B. Verication Procedure For each verication session, the feature vectors derived from the utterances of a claimant are concatenated to form a vector sequence T = [~x 1 ; ~x 2 ; : : : ; ~x T ]. The sequence is then divided into a number of overlapping segments containing T s consecutive vectors. Note that this approach is similar to that of [11] where each segment is considered to be independent. For a segment T s, the normalized average outputs z k X = 1 k = 1; 2 (1) T s ~x2t s e ~y k(~x) P 2 r=1 e~y r(~x) corresponding to the speaker and anti-speaker classes are computed, where ~y k (~x) = y k(~x) P (C k represents the scaled output and P (C k ) the prior probability of class C k. ) Verication decisions are based on the criterion: If z 1? z 2 > then accept the claimant then reject the claimant where 2 [?1; 1] is an a priori threshold that has been determined during enrollment (see Section III below). A verication decision is made for each segment, and the error rate (either or ) is the proportion of incorrect verication decisions to the total number of decisions. Details of the verication procedure can be found in [1]. III. Determination A. s and s versus Thresholds To determine the a priori thresholds, we need to obtain the and as a function of thresholds using enrollment data only. We propose three methods to (2) achieve this goal. They are denoted as Baseline, Pseudo- Impostor Based Threshold Determination (PIBTD) and Sampling Pseudo-Impostor Based Threshold Determination (SPIBTD) in this paper. A.1 Baseline This method, being very similar to that of Higgins et al. [4], is to form a baseline for comparison. Specically, for each registered speaker in the system, ve other speakers whose speech are closest to that of the speaker are selected from the population to form an anti-speaker set. Note that this is analogous to the ratio speakers of Higgins et al. The speech of the speaker and the anti-speakers are used to train a speaker model. Then, the same speech data from the anti-speakers are applied to the speaker model according to the above verication procedure. The as a function of the threshold is obtained by adjusting the threshold, resulting in an curve. Similarly, the speaker's utterances which have been used to train the speaker model are presented to the model to obtain an curve. A typical example of these curves is shown in Fig. 1. It shows that the during verication is considerably higher than that during enrollment, suggesting that the system is vulnerable to impostor's attacks. A.2 Pseudo-Impostor Based Threshold Determination (PIBTD) Using the same set of utterances for training the speaker model as well as for producing the and curves has a serious drawback. After training, the speaker model is likely to bias towards representing the training utterances of the speaker and anti-speakers. If the same set of utterances is applied to the speaker model for producing and curves, the curves obtained are likely to be biased. To resolve this problem, PIBTD uses an alternative set of speakers, called pseudo-impostor set, together with another set of utterances (for which the speaker model has never seen before) produced by the registered speaker to obtain the and curves. More specically, after training the speaker model, ve pseudo-impostors are randomly selected from the population and applied to the speaker model. These pseudo-impostors, being dierent from the anti-speakers and never seen by the speaker model before, are more likely to form a better representation of the impostor population. This prevents the curve from shifting along the threshold axis drastically during verication, as shown in Fig. 1. Therefore, the verication error rate becomes more predictable. A.3 Sampling Pseudo-Impostor Based Threshold Determination (SPIBTD) Obviously, the representation of the impostor population can be improved by increasing the number pseudoimpostors and anti-speakers. However, increasing the size of these sets will also lead to unrealistic enrollment time. SPIBTD aims at reducing the error rate and improving the robustness of the thresholds without increasing the enrollment time. The basic idea is to randomly select the

3 feature vectors from a large number of pseudo-impostors and anti-speakers for training a speaker model as well as for determining a threshold. In this way, the number of training vectors and the enrollment time remain the same as compared to PIBTD. Another advantage of this sampling strategy is that the resulting training vectors become more representative of the impostor population, for they are derived from more pseudo-impostors as compared to PIBTD. As shown in Fig. 1, this makes the position of curves more predictable as compared to PIBTD. The reduction in the displacement between the curves obtained during enrollment and verication means that the curve may be used to determine the threshold. More specically, the threshold is adjusted until the obtained during enrollment falls below an application dependent level. B. Threshold Selection Schemes Once the and curves are obtained, a threshold can be determined as follows. If the and curves cross each other, the crossing point will be chosen as the a priori threshold. Therefore, the a priori threshold is to equalize the chance of false acceptance or false rejection during enrollment. However, when the and curves do not cross each other, there exist a range of thresholds for which both and are zero. Four threshold selection schemes are proposed to handle this situation and they are summarized in Table I. IV. Experimental Evaluations In this work, all of the 138 speakers (18 male, 3 female) in the YOHO corpus [12] have been used for the experimental evaluations. For each speaker in the corpus, there are 4 enrollment sessions with 24 utterances in each session and 1 verication sessions of 4 utterances each. Each utterance is composed of three 2-digit numbers (e.g ). All sessions were recorded in an oce environment using a high quality telephone handset and sampled at 8 khz. The enrollment process involves two steps. First, for each speaker in the corpus, 72 utterances from the speaker's rst three enrollment sessions and 48 utterances from the 4 enrollment sessions of 5 anti-speakers (Baseline and PIBTD) were used to train a speaker model. For SPIBTD, the 48 utterances were randomly selected from 45 anti-speakers. Second, the a priori threshold was determined by using either the anti-speaker set (Baseline) or the pseudo-impostor set (PIBTD and SPIBTD) 1 together with the speaker's speech. The speaker's speech was derived from the training utterances (Baseline) or from other utterances for which the model has never seen before (PIBTD and SPIBTD). Verication was performed using each speaker in the corpus as a claimant, with 45 impostors being randomly selected from the remaining speakers (excluding the antispeakers and pseudo-impostors) and rotating through all speakers. The speaker's utterances, which were derived 1 Five pseudo-impostors were used in PIBTD, whereas in SPIBTD, the pseudo-impostor set is constructed by selecting the feature vectors of 45 pseudo-impostors randomly. from his/her 1 verication sessions, were concatenated to form a sequence of features vectors. Similarly, impostors' feature vectors were randomly selected from the utterances of 45 impostors and then concatenated to form a vector sequence whose length is the same as that formed by the speaker's utterances. Verication decisions were made according to (2) with the segment length T s in (1) being set to 3. 2 For each genuine trial, a window covering 3 vectors was advanced forward along the vector sequence by one vector position. This arrangement produces approximately 1 genuine trials and 1 impostor attempts for each speaker. LP-derived cepstral coecients were used as acoustic features. For each utterance, the silent regions were removed by a silent detection algorithm based on the energy and zero crossing rate of the signal. The remaining signals were pre-emphasized by a lter with transfer function 1? :95z?1. Twelfth-order LP-derived cepstral coecients were computed using a 28 ms Hamming window at a frame rate of 14 ms. These feature vectors were used to train a set of speaker models (EBF networks) with 12 inputs, two outputs, and 32 centers, where 8 centers were contributed by the corresponding speaker and the remaining 24 by the anti-speakers. A. and Versus Thresholds Fig. 1 depicts s and s as a function of thresholds of one of the 138 speakers. Some interesting results can be observed from these gures. First, Fig. 1 shows that there is a large displacement between the curve corresponding to enrollment and that corresponding to verication when anti-speakers' utterances were used to determine the curve during enrollment. Second, when pseudoimpostors were used to obtain the curve during enrollment, the displacement is considerably reduced, as shown in Fig. 1. Third, the displacement is further reduced for SPIBTD (Fig. 1) where feature vectors were randomly sampled from a large number of pseudo-impostors, suggesting that the verication performance of the system can be reliably predicted. Fig. 1 also suggests that the curve provides a reliable means of determining the threshold. B. Comparing Dierent Threshold Selection Schemes B.1 Using Baseline Fig. 2 compares the four selection schemes for the baseline method by plotting the a posteriori thresholds against the a priori thresholds corresponding to 138 speakers in the YOHO corpus. The a posteriori thresholds were chosen to equalize and during verication. Fig. 3 plots the versus the of 138 speakers using dierent threshold selection schemes. Fig. 2 shows that most of the a priori thresholds are greater than the a posteriori ones. This suggests that choosing the zero crossing of the curves (Scheme I) 2 This is roughly equivalent to the length of 3 utterances. Thus, the results can be compared with those of [4].

4 1 Baseline: Scheme II 1 PIBTD: Scheme IV 1 SPIBTD: Scheme IV 8 curves 8 curves 8 curves curves curves or 6 4 Th or 6 4 Th or 6 4 curves Th Threshold Threshold Threshold Fig. 1. s and s as a function of decision thresholds of a speaker during enrollment (solid) and verication (dots) using Baseline, PIBTD, and SPIBTD. The label \Th" denotes the a priori threshold found by the corresponding best threshold selection scheme. Scheme Description Best for I Selecting the zero crossings of the curves as thresholds None II Selecting the middle of the zero crossings of the and curves as Baseline thresholds III Selecting the zero crossings of the curves as thresholds PIBTD IV Selecting the point in the curve such that its corresponding attains a pre-dened value PIBTD & SPIBTD TABLE I Threshold selection schemes. All and curves are obtained at enrollment time. The last column lists the best threshold determination method(s) for each threshold selection scheme..8 Baseline: Scheme I.8 Baseline: Scheme II.8 Baseline: Scheme III.8 Baseline: Scheme IV Fig. 2. A posteriori equal error thresholds versus a priori thresholds found by choosing the zero crossings of curves (Scheme I), the middle of the zero crossing of and curves (Scheme II), the zero crossings of curves (Scheme III), and the point at which attains.5% (Scheme IV) as thresholds. The baseline method was used to obtain the and curves in all cases. is likely to overestimate the thresholds, resulting in high during verication as shown in Fig. 3. On the other hand, Figs. 2 and 2 suggest that using Schemes III and IV is likely to yield underestimated thresholds, resulting in high during verication (see Figs. 3 and 3). Therefore, a reasonable choice is to select the middle of the zero crossing of the and curves as the threshold, ie Scheme II. This scheme not only results in comparable a priori and a posteriori thresholds, as shown in Fig. 2, but also minimizes and for some of the speakers simultaneously during verication, as shown in Fig. 3. However, there are still many speakers having a low but a very high or vice versa, suggesting that the baseline method is not very robust. B.2 Using PIBTD Fig. 4 plots the a posteriori thresholds versus the a priori thresholds for PIBTD using the four threshold selection schemes as mentioned above. Similar to Baseline, choosing the zero crossings of curves is likely to overestimate the thresholds, resulting in high as shown in Fig. 5. This overestimation has been reduced by choosing the middle of the zero crossing of the and curves, as shown in Fig. 4. However, Fig. 5 shows that this still leads to high for many speakers. For PIBTD, both Scheme III and Scheme IV give a very good match between a priori and a posteriori thresholds as shown in Figs. 4 and 4. They also gives a reasonable trade-o between s and s, as illustrated in Figs. 5 and 5.

5 1 Baseline: Scheme I 1 Baseline: Scheme II 1 Baseline: Scheme III 1 Baseline: Scheme IV Fig. 3. s versus s corresponding to 138 speakers. All errors are based on the a priori thresholds determined by choosing the zero crossings of curves (Scheme I), the middle of the zero crossing of and curves (Scheme II), the zero crossings of curves (Scheme III), and the point at which the attains.5% as thresholds. The baseline method was used in all cases..8 PIBTD: Scheme I.8 PIBTD: Scheme II.8 PIBTD: Scheme III.8 PIBTD: Scheme IV Fig. 4. A posteriori equal error thresholds versus a priori thresholds found by choosing the zero crossings of curves, the middle of the zero crossing of and curves, the zero crossings of curves, and the point at which the attains.5% as thresholds. PIBTD was used to obtain the and curves. 1 PIBTD: Scheme I 1 PIBTD: Scheme II 1 PIBTD: Scheme III 1 PIBTD: Scheme IV Fig. 5. s versus s corresponding to 138 speakers. All errors are based on the a priori thresholds determined by choosing the zero crossings of curves, the middle of the zero crossing of and curves, the zero crossings of curves, and the point at which the attains.5% as thresholds. PIBTD was used in all cases. B.3 Using SPIBTD In SPIBTD, the curves obtained during enrollment become very close to those obtained during verication (see Fig. 1) because the speech are sampled from a large set of pseudo-impostors. Therefore, it makes sense to use the curves obtained during enrollment to determine the threshold. The question remains to be answered is that which part of the curve should be used for determining the threshold. To this end, we plot the curves of 1 speakers in Fig. 6. A closer look at this gure reveals that the curves become at when the error rate is close to zero. This is caused by the fact that the feature vectors of the 45 pseudo-impostors form a much scattered distribution in the feature space as compared to those formed by 5 anti-speakers or 5 pseudo-impostors. Recall from (1) and (2) that verication decisions are based on the average dierence between the two scaled network outputs. Therefore, if the pseudo-impostors' vectors spread over a wide region in the feature space, the chance of having some of these vectors being closed to the speaker's vectors becomes high. This phenomenon is illustrated in Fig. 7 where the distributions of speaker's speech and impostors' speech are assumed to be uni-modal. In Fig. 7, the false acceptance region is small as the pseudo-impostor patterns spread over a small area in the feature space, resulting in a small (E 1 ). On the other hand, Fig. 7 shows

6 Fig Threshold curves of 1 speakers obtained by using SPIBTD. T2 T2 Speaker s Patterns Speaker s Patterns False acceptance region T1 T1 False acceptance region E1 curve T1 Pseudo-impostor Patterns E1 E2 Pseudo-impostor Patterns T2 Decision boundary curve T1 T2 Decision boundary Fig. 7. Diagrams showing the eect of having a scattered distribution of pseudo-impostor patterns on the curves. The dashed lines represent the decision boundaries formed by setting the decision threshold to T1 and T2. that if the pseudo-impostor patterns spread over a large area, more patterns will be falsely accepted. The consequence is that the changes by an insignicant amount in a large range of threshold values, suggesting that using the zero crossing of curves may result in unreliable thresholds. Fig. 6 shows that the threshold becomes very sensitive to the when the latter falls below.5%. To overcome this dicult, Scheme IV focuses on the region where the threshold is less sensitive to the, and selects the threshold at which the attains an application dependent level. In this work, the level was set to.5%. Fig. 8 are comparable to Fig. 2 and Fig. 4, suggesting that using the zero crossing of as threshold is not appropriate for SPIBTD. The large number of high in Fig. 9 also agrees with this observation. While choosing the middle of the and curves as threshold brings the a priori thresholds slightly closer to the a posteriori thresholds, it still causes an unacceptably high for many speakers, as shown in Fig. 9. Fig. 9 suggests that selecting the zero crossings of curves as thresholds can only reduce the number of speakers with high slightly. This is because the thresholds are very sensitive to the at the region of zero crossings, resulting in overestimated thresholds. Comparisons between Fig. 8 and Fig. 8 as well as Fig. 9 and Fig. 9 reveal that choosing a threshold that produces a pre-dened at enrollment time is the best approach as it gives the best compromise between and. We can see from Fig. 9 that the number of speakers with high is progressively reduced when we shift the focus from the curves to the curves (Scheme I to Scheme IV). Clearly, the sampling strategy of SPIBTD not only makes the verication performance more predictable, but also provides us a reliable means of determining the thresholds. The main reason is that sampling the speech of a large number of impostors and anti-speakers is able to produce a better representation of the impostor population and to build more robust speaker models. C. Comparisons Based on the Best Threshold Selection Scheme The above results show that Baseline, PIBTD, and SPIBTD require dierent threshold selection schemes to achieve the best trade-o between and (see Table I). It is of interest to compare the results of these methods by using their respective best threshold selection scheme. 3 To this end, we compare Fig. 2, Fig. 4, Fig. 4 and Fig. 8 in terms of robustness in threshold determination, and compare Fig. 3, Fig. 5, Fig. 5 and Fig. 9 in terms of the error rates obtained during verication. Evidently, the a priori and a posteriori thresholds obtained by the baseline method have the largest dierence, causing small but very large or vice versa for most of the speakers, as shown in Fig. 3. This makes the performance of the system dicult to predict. While the number of speakers with high is smaller in Figs. 5 and 5, this has to be achieved by increasing the number of speakers with high. Among all the methods, SPIBTD produces the most predictable system when it is combined with Scheme IV. D. Comparisons Based on Average Error Rates Table II summarizes the average (based on 138 speakers in the YOHO corpus),, and ERR obtained by Baseline, PIBTD, and SPIBTD with dierent threshold selection schemes. The results show that using Scheme I results in very high during verication, although both and during enrollment are very small. Scheme II is only appropriate for the baseline method as it produces a comparatively high during verication for PIBTD 3 Here, we consider the scheme that produces the `best' balance between and as the best scheme.

7 .8 SPIBTD: Scheme I.8 SPIBTD: Scheme II.8 SPIBTD: Scheme III.8 SPIBTD: Scheme IV Fig. 8. A posteriori equal error thresholds versus a priori thresholds found by choosing the zero crossings of curves, the middle of the zero crossing of and curves, the zero crossings of curves, and the point at which (during enrollment) attains.5% as thresholds. SPIBTD was used to obtain the and curves. 1 SPIBTD: Scheme I 1 SPIBTD: Scheme II 1 SPIBTD: Scheme III 1 SPIBTD: Scheme IV Fig. 9. s versus s corresponding to 138 speakers. All errors are based on the a priori thresholds determined by choosing the zero crossings of curves, the middle of the zero crossing of and curves, the zero crossings of curves, and the point at which attains.5% as thresholds. SPIBTD was used in all cases. and SPIBTD. Scheme III produces a good compromise between and for PIBTD but not for Baseline and SPIBTD. Finally, Scheme IV gives the lowest and for SPIBTD as well as a good compromise between and for PIBTD. One should bear in mind that the gures in Table II are based on the average of 138 speakers. Therefore, a good match between the average and does not mean that it can also produce a good match for individual speakers. For example, Fig. 5 shows that combining PIBTD and Scheme IV produces unmatched and for many speakers although closely matched averages (3.41% versus 4.31% ) can be obtained. One may notice that the s during enrollment for Scheme IV in Table II are not equal to.5%. This is because the and curves of some speakers cross each other and the intersection point was chosen as the threshold, resulting in an being higher than.5%. Therefore, the average is slightly higher than the pre-dened value. Table II also demonstrates that SPIBTD is the best method in terms of equal error rate (ERR), suggesting that sampling a large set of pseudo-impostors and anti-speakers is able to improves the capability of the EBF networks in modeling the anti-speakers and rejecting impostors. The ERR of SPIBTD is also about half of that of Higgins et al. [4] (.7% against 1.8%), suggesting that sampling the utterances from a large number of anti-speakers is able to produce a more robust speaker model. Note also that even our baseline method has a lower ERR as compared to that of Higgins et al. This implies that incorporating anti-speakers' speech into a speaker model has merits. The last column in Table II lists the obtained by setting all thresholds to zero. Recall from Section II that the average dierence of the two normalized network outputs is compared with a threshold for making a verication decision. Recall also that Baseline, PIBTD, and SPIBTD use the same set of speaker data (but dierent sets of antispeaker data) for building a speaker model. Therefore, for a xed threshold, the becomes an indicator of how good the anti-speakers are modeled by the EBF networks. The last column of Table II clearly shows that SPIBTD is more capable of modeling the anti-speakers. V. Comparing to Higgins' Model Our proposed speaker models and threshold determination methods have three advantages over Higgins' ones [4]. First, Higgins' model requires to select 5 closest speakers (among 137) to form the ratio speaker set during verication, causing a computation burden. Our methods, on the other hand, embed the features of the ratio speakers in the speaker models during enrollment. Rather than selecting 5 closest ratio speakers, our methods only need to sample the feature vectors of 45 anti-speakers for constructing a speaker model, which only take a fraction of the time. One may argue that we simply shift the com-

8 Enrollment Verication Zero Threshold Method % % % % ERR % % Scheme I Baseline Scheme II Scheme III Scheme IV Scheme I PIBTD Scheme II Scheme III Scheme IV Scheme I SPIBTD Scheme II Scheme III Scheme IV TABLE II Average error rates obtained by different methods. For Scheme IV, the pre-defined was set to.5%. putation burden from the verication sessions to the enrollment sessions. However, a low computation overhead during verication is certainly an advantage in real-world applications where real-time response is essential. During verication, our methods only require to compute two likelihood functions (one for the speaker class and the other for the anti-speaker class), whereas Higgins' model requires six (one for the speaker and another ve for the ratio speakers) likelihood functions to be evaluated. The second advantage of our proposed speaker models is that they are more robust in rejecting impostors. If the impostor's speech is closer to the speaker's speech than to the speech of the 5 ratio speakers, Higgins' model will accept the impostor. In our case, the speaker model is constructed by sampling the speech of the corresponding speaker and 45 anti-speakers. The latter forms a better representation of the impostor population as compared to using the speech of only 5 ratio speakers. Therefore, it is more likely that the impostor' speech is closer to the antispeakers' speech than to the speaker's speech, resulting in a rejection. The third advantage being that our methods provide several means of nding a threshold that can strike a balance between and, whereas Higgins' model makes no attempts to achieve this goal. Our experimental results show that the proposed methods not only produce an ERR that is only half of that obtained by Higgins et al. (.7% versus 1.8%), but also produce a good compromise between and. For example, Higgins et al. obtained 4.2% and.37% using 3 utterances per trial, whereas we obtained 3.94% and 1.12% using speech segments with length approximately equal to three utterances. VI. Conclusions This paper addresses the problem of determining a priori thresholds for phrase-prompted speaker verication. Conventional approaches have been compared with the proposed one. Experimental evaluations based on 138 speakers in the YOHO corpus have been carried out. It was shown that robust thresholds can be obtained by simulating an operation environment as close as possible to the real one. Our proposed method is able to predict the verication performance accurately by using enrollment data only, leading to more reliable thresholds. With the proposed method, it is able to nd a better balance between s and s. References [1] S. Furui. Cepstral analysis technique for automatic speaker verication. IEEE Trans. on Acoustics Speech and Signal Processing, ASSP-29(2):254{272, [2] D. K. Burton. Text-dependent speaker verication using vector quantization source coding. IEEE Trans. on Acoustics, Speech, and Signal Processing, ASSP-35(2):133{143, [3] J. M. Naik, L. P. Netsch, and G. R. Doddington. Speaker verication over long distance telephone lines. In Proc. ICASSP'89, [4] A. Higgins, L. Bahler, and J. Porter. Speaker verication using randomized phrase prompting. Digital Signal Processing, 1:89{ 16, [5] T. Matsui and S. Furui. Likelihood normalization for speaker verication using a phoneme- and speaker-independent model. Speech Communications, 17:19{116, [6] C. S. Liu, C. H. Lee, B. H. Juang, and A. E. Rosenberg. Speaker recognition based on minimum error discriminative training. In Proc. ICASSP'94, volume 1, pages 325{328, [7] A. E. Rosenberg, J. DeLong, C. H. Lee, B. H. Juang, and F. K. Soong. The use of cohort normalized scores for speaker verication. In Proc. ICSLP'92, volume 2, pages 599{62, [8] A. E. Rosenberg, O. Siohan, and S. Parthasarathy. Speaker verication using minimum verication error training. In Proc. ICASSP'98, pages 15{18, [9] J. B. Pierrot et al. A comparison of a priori threshold setting procedures for speaker verication in the CAVE project. In Proc. ICASSP'98, pages 125{128, [1] M. W. Mak and C. K. Li. Elliptical basis function networks and radial basis function networks for speaker verication: A comparative study. In IJCNN'99, July [11] D. A. Reynolds and R. C. Rose. Robust text-independent speaker identication using Gaussian mixture speaker models. IEEE Trans. on Speech and Audio Processing, 3(1):72{83, [12] Jr. J. P. Campbell. Testing with the YOHO CD-ROM voice verication corpus. In ICASSP95, pages 341{344, 1995.

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

More information

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology ISCA Archive SUBJECTIVE EVALUATION FOR HMM-BASED SPEECH-TO-LIP MOVEMENT SYNTHESIS Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano Graduate School of Information Science, Nara Institute of Science & Technology

More information

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Hua Zhang, Yun Tang, Wenju Liu and Bo Xu National Laboratory of Pattern Recognition Institute of Automation, Chinese

More information

WHEN THERE IS A mismatch between the acoustic

WHEN THERE IS A mismatch between the acoustic 808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,

More information

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012 Text-independent Mono and Cross-lingual Speaker Identification with the Constraint of Limited Data Nagaraja B G and H S Jayanna Department of Information Science and Engineering Siddaganga Institute of

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

user s utterance speech recognizer content word N-best candidates CMw (content (semantic attribute) accept confirm reject fill semantic slots

user s utterance speech recognizer content word N-best candidates CMw (content (semantic attribute) accept confirm reject fill semantic slots Flexible Mixed-Initiative Dialogue Management using Concept-Level Condence Measures of Speech Recognizer Output Kazunori Komatani and Tatsuya Kawahara Graduate School of Informatics, Kyoto University Kyoto

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

Dynamic Pictures and Interactive. Björn Wittenmark, Helena Haglund, and Mikael Johansson. Department of Automatic Control

Dynamic Pictures and Interactive. Björn Wittenmark, Helena Haglund, and Mikael Johansson. Department of Automatic Control Submitted to Control Systems Magazine Dynamic Pictures and Interactive Learning Björn Wittenmark, Helena Haglund, and Mikael Johansson Department of Automatic Control Lund Institute of Technology, Box

More information

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS Elliot Singer and Douglas Reynolds Massachusetts Institute of Technology Lincoln Laboratory {es,dar}@ll.mit.edu ABSTRACT

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

A study of speaker adaptation for DNN-based speech synthesis

A study of speaker adaptation for DNN-based speech synthesis A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,

More information

Support Vector Machines for Speaker and Language Recognition

Support Vector Machines for Speaker and Language Recognition Support Vector Machines for Speaker and Language Recognition W. M. Campbell, J. P. Campbell, D. A. Reynolds, E. Singer, P. A. Torres-Carrasquillo MIT Lincoln Laboratory, 244 Wood Street, Lexington, MA

More information

Clouds = Heavy Sidewalk = Wet. davinci V2.1 alpha3

Clouds = Heavy Sidewalk = Wet. davinci V2.1 alpha3 Identifying and Handling Structural Incompleteness for Validation of Probabilistic Knowledge-Bases Eugene Santos Jr. Dept. of Comp. Sci. & Eng. University of Connecticut Storrs, CT 06269-3155 eugene@cse.uconn.edu

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH Don McAllaster, Larry Gillick, Francesco Scattone, Mike Newman Dragon Systems, Inc. 320 Nevada Street Newton, MA 02160

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,

More information

NCEO Technical Report 27

NCEO Technical Report 27 Home About Publications Special Topics Presentations State Policies Accommodations Bibliography Teleconferences Tools Related Sites Interpreting Trends in the Performance of Special Education Students

More information

The Effects of Ability Tracking of Future Primary School Teachers on Student Performance

The Effects of Ability Tracking of Future Primary School Teachers on Student Performance The Effects of Ability Tracking of Future Primary School Teachers on Student Performance Johan Coenen, Chris van Klaveren, Wim Groot and Henriëtte Maassen van den Brink TIER WORKING PAPER SERIES TIER WP

More information

On the Combined Behavior of Autonomous Resource Management Agents

On the Combined Behavior of Autonomous Resource Management Agents On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science

More information

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Prof. Ch.Srinivasa Kumar Prof. and Head of department. Electronics and communication Nalanda Institute

More information

Mandarin Lexical Tone Recognition: The Gating Paradigm

Mandarin Lexical Tone Recognition: The Gating Paradigm Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language Z.HACHKAR 1,3, A. FARCHI 2, B.MOUNIR 1, J. EL ABBADI 3 1 Ecole Supérieure de Technologie, Safi, Morocco. zhachkar2000@yahoo.fr.

More information

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION Mitchell McLaren 1, Yun Lei 1, Luciana Ferrer 2 1 Speech Technology and Research Laboratory, SRI International, California, USA 2 Departamento

More information

Probability estimates in a scenario tree

Probability estimates in a scenario tree 101 Chapter 11 Probability estimates in a scenario tree An expert is a person who has made all the mistakes that can be made in a very narrow field. Niels Bohr (1885 1962) Scenario trees require many numbers.

More information

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders

More information

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick

More information

Non intrusive multi-biometrics on a mobile device: a comparison of fusion techniques

Non intrusive multi-biometrics on a mobile device: a comparison of fusion techniques Non intrusive multi-biometrics on a mobile device: a comparison of fusion techniques Lorene Allano 1*1, Andrew C. Morris 2, Harin Sellahewa 3, Sonia Garcia-Salicetti 1, Jacques Koreman 2, Sabah Jassim

More information

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

CHAPTER 4: REIMBURSEMENT STRATEGIES 24 CHAPTER 4: REIMBURSEMENT STRATEGIES 24 INTRODUCTION Once state level policymakers have decided to implement and pay for CSR, one issue they face is simply how to calculate the reimbursements to districts

More information

INPE São José dos Campos

INPE São José dos Campos INPE-5479 PRE/1778 MONLINEAR ASPECTS OF DATA INTEGRATION FOR LAND COVER CLASSIFICATION IN A NEDRAL NETWORK ENVIRONNENT Maria Suelena S. Barros Valter Rodrigues INPE São José dos Campos 1993 SECRETARIA

More information

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project Phonetic- and Speaker-Discriminant Features for Speaker Recognition by Lara Stoll Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California

More information

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

Why Did My Detector Do That?!

Why Did My Detector Do That?! Why Did My Detector Do That?! Predicting Keystroke-Dynamics Error Rates Kevin Killourhy and Roy Maxion Dependable Systems Laboratory Computer Science Department Carnegie Mellon University 5000 Forbes Ave,

More information

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method Sanket S. Kalamkar and Adrish Banerjee Department of Electrical Engineering

More information

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Systematic reviews in theory and practice for library and information studies

Systematic reviews in theory and practice for library and information studies Systematic reviews in theory and practice for library and information studies Sue F. Phelps, Nicole Campbell Abstract This article is about the use of systematic reviews as a research methodology in library

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Pedagogical Content Knowledge for Teaching Primary Mathematics: A Case Study of Two Teachers

Pedagogical Content Knowledge for Teaching Primary Mathematics: A Case Study of Two Teachers Pedagogical Content Knowledge for Teaching Primary Mathematics: A Case Study of Two Teachers Monica Baker University of Melbourne mbaker@huntingtower.vic.edu.au Helen Chick University of Melbourne h.chick@unimelb.edu.au

More information

Lecture 2: Quantifiers and Approximation

Lecture 2: Quantifiers and Approximation Lecture 2: Quantifiers and Approximation Case study: Most vs More than half Jakub Szymanik Outline Number Sense Approximate Number Sense Approximating most Superlative Meaning of most What About Counting?

More information

Application of Virtual Instruments (VIs) for an enhanced learning environment

Application of Virtual Instruments (VIs) for an enhanced learning environment Application of Virtual Instruments (VIs) for an enhanced learning environment Philip Smyth, Dermot Brabazon, Eilish McLoughlin Schools of Mechanical and Physical Sciences Dublin City University Ireland

More information

GROUP COMPOSITION IN THE NAVIGATION SIMULATOR A PILOT STUDY Magnus Boström (Kalmar Maritime Academy, Sweden)

GROUP COMPOSITION IN THE NAVIGATION SIMULATOR A PILOT STUDY Magnus Boström (Kalmar Maritime Academy, Sweden) GROUP COMPOSITION IN THE NAVIGATION SIMULATOR A PILOT STUDY Magnus Boström (Kalmar Maritime Academy, Sweden) magnus.bostrom@lnu.se ABSTRACT: At Kalmar Maritime Academy (KMA) the first-year students at

More information

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING Gábor Gosztolya 1, Tamás Grósz 1, László Tóth 1, David Imseng 2 1 MTA-SZTE Research Group on Artificial

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

2 Mitsuru Ishizuka x1 Keywords Automatic Indexing, PAI, Asserted Keyword, Spreading Activation, Priming Eect Introduction With the increasing number o

2 Mitsuru Ishizuka x1 Keywords Automatic Indexing, PAI, Asserted Keyword, Spreading Activation, Priming Eect Introduction With the increasing number o PAI: Automatic Indexing for Extracting Asserted Keywords from a Document 1 PAI: Automatic Indexing for Extracting Asserted Keywords from a Document Naohiro Matsumura PRESTO, Japan Science and Technology

More information

Spoofing and countermeasures for automatic speaker verification

Spoofing and countermeasures for automatic speaker verification INTERSPEECH 2013 Spoofing and countermeasures for automatic speaker verification Nicholas Evans 1, Tomi Kinnunen 2 and Junichi Yamagishi 3,4 1 EURECOM, Sophia Antipolis, France 2 University of Eastern

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

SETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT

SETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT SETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT By: Dr. MAHMOUD M. GHANDOUR QATAR UNIVERSITY Improving human resources is the responsibility of the educational system in many societies. The outputs

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH 2009 423 Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition George

More information

Summarizing Text Documents: Carnegie Mellon University 4616 Henry Street

Summarizing Text Documents:   Carnegie Mellon University 4616 Henry Street Summarizing Text Documents: Sentence Selection and Evaluation Metrics Jade Goldstein y Mark Kantrowitz Vibhu Mittal Jaime Carbonell y jade@cs.cmu.edu mkant@jprc.com mittal@jprc.com jgc@cs.cmu.edu y Language

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Probability and Statistics Curriculum Pacing Guide

Probability and Statistics Curriculum Pacing Guide Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods

More information

Knowledge Transfer in Deep Convolutional Neural Nets

Knowledge Transfer in Deep Convolutional Neural Nets Knowledge Transfer in Deep Convolutional Neural Nets Steven Gutstein, Olac Fuentes and Eric Freudenthal Computer Science Department University of Texas at El Paso El Paso, Texas, 79968, U.S.A. Abstract

More information

Semi-Supervised Face Detection

Semi-Supervised Face Detection Semi-Supervised Face Detection Nicu Sebe, Ira Cohen 2, Thomas S. Huang 3, Theo Gevers Faculty of Science, University of Amsterdam, The Netherlands 2 HP Research Labs, USA 3 Beckman Institute, University

More information

Improvements to the Pruning Behavior of DNN Acoustic Models

Improvements to the Pruning Behavior of DNN Acoustic Models Improvements to the Pruning Behavior of DNN Acoustic Models Matthias Paulik Apple Inc., Infinite Loop, Cupertino, CA 954 mpaulik@apple.com Abstract This paper examines two strategies that positively influence

More information

MERGA 20 - Aotearoa

MERGA 20 - Aotearoa Assessing Number Sense: Collaborative Initiatives in Australia, United States, Sweden and Taiwan AIistair McIntosh, Jack Bana & Brian FarreII Edith Cowan University Group tests of Number Sense were devised

More information

The distribution of school funding and inputs in England:

The distribution of school funding and inputs in England: The distribution of school funding and inputs in England: 1993-2013 IFS Working Paper W15/10 Luke Sibieta The Institute for Fiscal Studies (IFS) is an independent research institute whose remit is to carry

More information

Speaker recognition using universal background model on YOHO database

Speaker recognition using universal background model on YOHO database Aalborg University Master Thesis project Speaker recognition using universal background model on YOHO database Author: Alexandre Majetniak Supervisor: Zheng-Hua Tan May 31, 2011 The Faculties of Engineering,

More information

Voice conversion through vector quantization

Voice conversion through vector quantization J. Acoust. Soc. Jpn.(E)11, 2 (1990) Voice conversion through vector quantization Masanobu Abe, Satoshi Nakamura, Kiyohiro Shikano, and Hisao Kuwabara A TR Interpreting Telephony Research Laboratories,

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Speech Communication Session 2aSC: Linking Perception and Production

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

Statewide Framework Document for:

Statewide Framework Document for: Statewide Framework Document for: 270301 Standards may be added to this document prior to submission, but may not be removed from the framework to meet state credit equivalency requirements. Performance

More information

SOFTWARE EVALUATION TOOL

SOFTWARE EVALUATION TOOL SOFTWARE EVALUATION TOOL Kyle Higgins Randall Boone University of Nevada Las Vegas rboone@unlv.nevada.edu Higgins@unlv.nevada.edu N.B. This form has not been fully validated and is still in development.

More information

Sector Differences in Student Learning: Differences in Achievement Gains Across School Years and During the Summer

Sector Differences in Student Learning: Differences in Achievement Gains Across School Years and During the Summer Catholic Education: A Journal of Inquiry and Practice Volume 7 Issue 2 Article 6 July 213 Sector Differences in Student Learning: Differences in Achievement Gains Across School Years and During the Summer

More information

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C Using and applying mathematics objectives (Problem solving, Communicating and Reasoning) Select the maths to use in some classroom

More information

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers. Information Systems Frontiers manuscript No. (will be inserted by the editor) I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers. Ricardo Colomo-Palacios

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

On-the-Fly Customization of Automated Essay Scoring

On-the-Fly Customization of Automated Essay Scoring Research Report On-the-Fly Customization of Automated Essay Scoring Yigal Attali Research & Development December 2007 RR-07-42 On-the-Fly Customization of Automated Essay Scoring Yigal Attali ETS, Princeton,

More information

Softprop: Softmax Neural Network Backpropagation Learning

Softprop: Softmax Neural Network Backpropagation Learning Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science

More information

Automatic Pronunciation Checker

Automatic Pronunciation Checker Institut für Technische Informatik und Kommunikationsnetze Eidgenössische Technische Hochschule Zürich Swiss Federal Institute of Technology Zurich Ecole polytechnique fédérale de Zurich Politecnico federale

More information

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion

More information

GCSE English Language 2012 An investigation into the outcomes for candidates in Wales

GCSE English Language 2012 An investigation into the outcomes for candidates in Wales GCSE English Language 2012 An investigation into the outcomes for candidates in Wales Qualifications and Learning Division 10 September 2012 GCSE English Language 2012 An investigation into the outcomes

More information

Evaluation of a College Freshman Diversity Research Program

Evaluation of a College Freshman Diversity Research Program Evaluation of a College Freshman Diversity Research Program Sarah Garner University of Washington, Seattle, Washington 98195 Michael J. Tremmel University of Washington, Seattle, Washington 98195 Sarah

More information

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing D. Indhumathi Research Scholar Department of Information Technology

More information

Longitudinal Analysis of the Effectiveness of DCPS Teachers

Longitudinal Analysis of the Effectiveness of DCPS Teachers F I N A L R E P O R T Longitudinal Analysis of the Effectiveness of DCPS Teachers July 8, 2014 Elias Walsh Dallas Dotter Submitted to: DC Education Consortium for Research and Evaluation School of Education

More information

learning collegiate assessment]

learning collegiate assessment] [ collegiate learning assessment] INSTITUTIONAL REPORT 2005 2006 Kalamazoo College council for aid to education 215 lexington avenue floor 21 new york new york 10016-6023 p 212.217.0700 f 212.661.9766

More information

Generative models and adversarial training

Generative models and adversarial training Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?

More information

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren Speech Technology and Research Laboratory, SRI International,

More information

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access Joyce McDonough 1, Heike Lenhert-LeHouiller 1, Neil Bardhan 2 1 Linguistics

More information

School Competition and Efficiency with Publicly Funded Catholic Schools David Card, Martin D. Dooley, and A. Abigail Payne

School Competition and Efficiency with Publicly Funded Catholic Schools David Card, Martin D. Dooley, and A. Abigail Payne School Competition and Efficiency with Publicly Funded Catholic Schools David Card, Martin D. Dooley, and A. Abigail Payne Web Appendix See paper for references to Appendix Appendix 1: Multiple Schools

More information

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 Ranniery Maia 1,2, Jinfu Ni 1,2, Shinsuke Sakai 1,2, Tomoki Toda 1,3, Keiichi Tokuda 1,4 Tohru Shimizu 1,2, Satoshi Nakamura 1,2 1 National

More information

Australia s tertiary education sector

Australia s tertiary education sector Australia s tertiary education sector TOM KARMEL NHI NGUYEN NATIONAL CENTRE FOR VOCATIONAL EDUCATION RESEARCH Paper presented to the Centre for the Economics of Education and Training 7 th National Conference

More information

Research Update. Educational Migration and Non-return in Northern Ireland May 2008

Research Update. Educational Migration and Non-return in Northern Ireland May 2008 Research Update Educational Migration and Non-return in Northern Ireland May 2008 The Equality Commission for Northern Ireland (hereafter the Commission ) in 2007 contracted the Employment Research Institute

More information

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District Report Submitted June 20, 2012, to Willis D. Hawley, Ph.D., Special

More information

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition Seltzer, M.L.; Raj, B.; Stern, R.M. TR2004-088 December 2004 Abstract

More information

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique Hiromi Ishizaki 1, Susan C. Herring 2, Yasuhiro Takishima 1 1 KDDI R&D Laboratories, Inc. 2 Indiana University

More information

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Jana Kitzmann and Dirk Schiereck, Endowed Chair for Banking and Finance, EUROPEAN BUSINESS SCHOOL, International

More information

Build on students informal understanding of sharing and proportionality to develop initial fraction concepts.

Build on students informal understanding of sharing and proportionality to develop initial fraction concepts. Recommendation 1 Build on students informal understanding of sharing and proportionality to develop initial fraction concepts. Students come to kindergarten with a rudimentary understanding of basic fraction

More information