Corinne Fredouille (2), Daniel Moraru (1), Sylvain Meignier (2), Laurent Besacier (1), Jean-François Bonastre (2)

Size: px
Start display at page:

Download "Corinne Fredouille (2), Daniel Moraru (1), Sylvain Meignier (2), Laurent Besacier (1), Jean-François Bonastre (2)"

Transcription

1 THE NIST 2004 SPRING RICH TRANSCRIPTION EVALUATION: TWO-AXIS MERGING STRATEGY IN THE CONTEXT OF MULTIPLE DISTANT MICROPHONE BASED MEETING SPEAKER SEGMENTATION Corinne Fredouille (2), Daniel Moraru (1), Sylvain Meignier (2), Lauren Besacier (1), Jean-François Bonasre (2) 1 CLIPS-IMAG (UJF & CNRS) - BP Grenoble Cedex 9 - France 2 LIA-Avignon - BP Avignon Cedex 9 France (daniel.moraru,lauren.besacier)@imag.fr (sylvain.meignier,corinne.fredouille,jean-francois.bonasre)@lia.univ-avignon.fr ABSTRACT This paper presens he ELISA speaker segmenaion approach applied on muliple audio channel meeing recordings in he framework of NIST RT 04s meeing (spring) evaluaion campaign. As done for BN daa speaker segmenaion, he ELISA meeing sysem involves wo speaker segmenaion sysems developed individually by he CLIPS and LIA laboraories. The main originaliy consiss in a wo-axis merging sraegy, proposed o deal wih boh muliple exper segmenaion oupus and muliple microphone segmenaion oupus. While exper merging sraegy did no really lead o an improvemen of he performance, he individual microphone segmenaion merging sraegy allowed o provide a global segmenaion oupu from several audio channels (microphones) wih accepable performance. The bes sysem obained 22.6% of diarizaion error rae during he NIST RT 04s meeing evaluaion. 1. INTRODUCTION The goal of speaker diarizaion (or segmenaion) is o segmen a N-speaker audio documen in homogeneous pars conaining he voice of only one speaker (also called speaker change deecion process) and o associae he resuling segmens by maching hose belonging o a same speaker (clusering process). In speaker diarizaion he inrinsic difficuly of he ask increases according o he daa concerned: (wo-speaker) elephone conversaions, broadcas news, meeing daa. This paper is relaed o speaker diarizaion on meeing daa in he framework of NIST 2004 spring meeing Rich Transcripion (RT 04s) evaluaion. Meeing daa presen hree main specificiies compared o BN daa [1]. Firsly, he speech is fully-sponaneous, highly ineracive across paricipans, and presens a large number of disfluencies as well as speaker segmen overlaps. Secondly, he meeing room recording condiions associaed wih disan (able) microphones lead o noisy recordings, including background noises, reverberaions and disan speakers. Thirdly, meeing conversaions are recorded in smar spaces where muliple sensors are used. Thus, he speaker diarizaion sysem has o rea muliple speech channels coming from muliple microphones. The choice of an efficien merging sraegy in order o discard he irrelevan informaion is hen an imporan issue. This las poin is he core problem addressed in his paper. Secion 2 of his paper presens he wo ELISA speaker diarizaion sysems. Secion 3 describes he sraegies used o specifically rea meeing daa by merging muliple microphone segmenaion oupus and opionally muliple expers. Secion 4 presens he experimenal proocols and resuls. Finally, secion 5 concludes his work. 2. SPEAKER SEGMENTATION SYSTEMS Two speaker segmenaion sysems are involved in his work, developed individually by he CLIPS and LIA laboraories in he framework of he ELISA consorium [2]. Boh of hem paricipaed a he Rich Transcripion 2003 evaluaion campaign (RT 03) for he speaker segmenaion ask on broadcas news daa [3]. No paricular uning has be done on boh sysems o paricipae a RT 04s evaluaion campaign excep he use of a speech/non speech segmenaion as a preliminary phase o deal wih he specificiies of meeing daa. 2.1 Speech/non speech segmenaion The speech/non speech segmenaion sysem consiss in a silence deecion based only on a bi-gaussian modeling of he energy disribuion associaed wih a deecion hreshold. The silence segmen minimal lengh is se o 0.5s The LIA Sysem The LIA sysem is based on Hidden Markov Modeling (HMM) of he conversaion. Each sae of he HMM characerizes a speaker and he ransiions model he changes beween speakers. The speaker segmenaion sysem is applied on he speech segmens deeced by he speech/non speech segmenaion described in secion 2.1. During he segmenaion, he HMM is generaed using an ieraive process, which deecs and adds a new sae (i.e. a new speaker) a each ieraion. This speaker deecion process is hen followed by a re-segmenaion phase (ieraive adapaion and decoding process) which allows o refine speaker segmenaion. The enire speaker segmenaion process is largely described in [3][4].

2 Concerning he fron end processing, he signal is characerized by 20 linear Cepsral feaures (LFCC) compued every 10 ms using a 20ms window. The Cepsral feaures are augmened by he energy. No frame removal or any coefficien normalizaion is applied. 2.3 The CLIPS Sysem The CLIPS sysem is based on a BIC [5] (Bayesian Informaion Crierion) speaker change deecor followed by an hierarchical clusering. The clusering sop condiion is he esimaion of he number of speakers using a penalized BIC crierion. The enire speaker segmenaion process is largely described in [3][4]. Finally, he re-segmenaion phase of he LIA sysem is also applied on he CLIPS segmenaion for refinemen 1. Like he LIA sysem, he CLIPS sysem is applied on he speech segmens deeced by he speech/non speech segmenaion. The signal is characerized by 16 mel Cepsral feaures (MFCC) compued every 10ms on 20ms windows using 56 filer banks. Then he Cepsral feaures are augmened by he energy. No frame removal or any coefficien normalizaion is applied. 3. MEETING SPEAKER SEGMENTATION STRATEGIES Since meeings are generally recorded wih muliple disan microphones, he speaker segmenaion ask differs grealy from oher domains like broadcas news or elephone conversaions. Indeed, speaker segmenaion sysem has o deal wih muliple speech signals (from he differen disan microphones) when he objecive is o provide a single meeing speaker segmenaion oupu. Moreover, according o he disan microphone posiion in he able, he qualiy of signal may hugely differ from one microphone o anoher. For insance, he main speaker uerances may be caugh by one or wo disan microphones while he oher microphones mainly provide background voices, long silence, or background noise only. To deal wih hese differen issues, wo cooperaive merging sraegies are presened in his paper. The firs one, called exper merging sraegy aims a merging segmenaions provided by differen expers (wo expers in his paper). I is applied independenly on each recording issued from a disan microphone. The second one, called Individual Microphone Segmenaion Merging sraegy (), is used o produce a single speaker segmenaion oupu from hose obained on each individual disan microphone. The applicaion of boh sraegies, also referred as wo merging axes horizonal and verical, is illusraed on figure 1. Figure 1: Two cooperaive merging sraegies horizonal and verical merging combinaion 1 This combinaion of CLIPS sysem and LIA re-segmenaion phase was also proposed as a merging sraegy during RT 03 evaluaion [4] and obained he bes performance over all he paricipans wih 12,88% of speaker diarizaion error rae. 3.1 Exper Merging Sraegy The idea of his sraegy is o merge he segmenaions issued from wo expers CLIPS and LIA sysems compued independenly on a given disan microphone. This sraegy was already used by he LIA and CLIPS labs for he RT 03 speaker segmenaion evaluaion campaign on broadcas news daa [4]. I relies on a frame based decision which consiss in grouping he labels proposed by boh he sysems a he frame level before applying a re-segmenaion process (see figure 2). An example of he label merging approach is illusraed below: Frame i: Sys1= S1, Sys2= T4 label S1T4, Frame i+1 : Sys1= S2, Sys2= T4 label S2T4 Label merge New segmenaion Figure 2: Exper merging sraegy Re-segmenaion This label merging mehod generaes (before resegmenaion) a large se of virual speakers composed of: Virual speakers ha have a large amoun of daa assigned. These speakers could be considered he correc hypohesis speakers; Virual speakers generaed by only one of he wo sysems, for example he speakers associaed wih only one shor segmen (~3s up o 10s). These hypohesis speakers could be suppressed (he weigh of hese speakers on he final scoring is marginal); Virual speakers ha have a smaller amoun of daa scaered beween muliple small segmens and ha could be considered zones of indecision. Based on hese consideraions, he LIA re-segmenaion is hen applied on he merged segmenaion. During his ieraive process, he virual speakers for whom oal ime is shorer han 3s are deleed. The daa of hese deleed speakers will furher be dispached beween he remaining speakers during he nex ieraion. Afer he firs ieraion he number of speakers is already drasically reduced since speakers associaed wih indecision zones do no cach any daa during he Vierbi decoding and are auomaically removed. However, he merging sraegy canno generally solve he wrong behaviour of iniial sysems ha could spli a rue speaker in wo hypohesis speakers, each ied o a long segmen. Suppose all sysems agreed on a long segmen excep one which splis i in wo pars. This would produce wo virual speakers (associaed wih long duraion segmens) afer he label merging phase and since no clusering is applied before re-segmenaion, i leads o a "rue" speaker spli in wo virual speakers. 3.2 Individual Microphone Segmenaion Merging Sraegy The goal of his sraegy is o merge he muliple disan microphone segmenaions in a single meeing speaker segmenaion oupu. Since no single signal is represenaive of he overall meeing, his sraegy mus rely on some segmen selecion rules over he muliple disan microphone speaker segmenaions.

3 In his way, a specific merging algorihm is proposed in his paper. Developed by he LIA and CLIPS labs, i relies on an ieraive process which aims a deecing he longes speaker inervenions over he se of disan microphone segmenaions. This algorihm consiss in 3 seps : Sep 1: selecing he longes speaker inervenion over all microphone segmenaion oupus aken separaely. The longes speaker inervenion means all he segmens (coniguous or no) aribued o he speaker over a specific microphone segmenaion. These segmens are definiely aribued o a new speaker in he resuling segmenaion. Sep 2: deleing in each disan microphone segmenaion all he segmens aribued o he new speaker a he end of sep 1. Sep 3: verifying he presence of no seleced segmens over all he disan microphone segmenaions. If segmens are sill presen and heir oal lengh is greaer han 30s, hen back o sep 1 for a new ieraion, else sop he process and assign he segmens o a las speaker label (his las speaker can be seen as a rash speaker relaed o all he shor remaining segmens). One rule is used during his ieraive process : if he longes speaker inervenion seleced during sep 1 is longer han 60% of he overall signal duraion, i is no considered (unless i is he las available inervenion). This rule aims a discarding some very long speaker segmenaion oupus, which may resul from poor individual microphone segmenaions (he badness of an individual microphone segmenaion may be due, for insance, o he major presence of background voice/noise over he microphone signal, involving a large rae of speech/non speech segmenaion errors). 4. EXPERIMENTS AND RESULTS 4.1 Evaluaion proocols RT 04s meeing evaluaion campaign [6], proposed wo main asks: speech-o-ex ranscripion (STT) and/or speaker segmenaion (so called diarizaion). For boh asks, differen microphone condiions were available: muliple disan microphones, single disan microphone and individual head microphone (he laer was available for STT only). This paper addresses only speaker segmenaion over muliple disan microphones. This secion describes he evaluaion proocols used o measure he performance, presens some resuls and discusses he behaviour of he wo axis merging sraegy. Scoring In order o measure performance, an opimum one-o-one mapping of reference speaker IDs o sysem oupu speaker IDs is compued, followed by a ime based speaker segmenaion error rae. This scoring, proposed by NIST, is described in deails in he RT 04s evaluaion plan [7]. Speaker segmenaion performance is expressed in erms of speaker diarizaion error, comprising missed and false alarm speaker errors as well as speaker segmenaion errors. NB: In his paper, he areas of overlap beween speaker uerances are no scored. Daabase Since his work was done in he conex of RT 04s evaluaion campaign, wo meeing corpora are available, named in his paper Dev corpus for he developmen of sysems and Eva corpus for he evaluaion. Boh of hem are composed of wo 10mn meeing excerps recorded over four differen sies (CMU, ICSI, LDC, and NIST). Table 1 provides some deails on he differen corpora, including, for each meeing excerp, he number of available disan microphones. For each disan microphone, heir posiion in he meeing room is available as furher informaion and may be used o help speaker segmenaion process. Neverheless, approaches presened in his paper do no ake advanage of his kind of informaion. Finally, as for any speaker segmenaion evaluaion, no prior informaion abou he number of speakers and heir ideniy is available. Dev Eva Meeings micro nb Meeings micro nb CMU_ CMU_ CMU_ CMU_ ICSI_ ICSI_ ICSI_ ICSI_ LDC_ LDC_ LDC_ LDC_ NIST_ NIST_ NIST_ NIST_ Table 1: Number of disan microphones for each meeing of Dev and Eva corpora. 4.2 Resuls Tables 2 and 3 provide he experimenal resuls obained on Dev and Eva corpora for he ask of muliple disan microphone speaker segmenaion. These resuls, expressed in erms of speaker diarizaion error raes, are given for hree differen sysems: LIA+: he LIA speaker segmenaion sysem applied on each individual disan microphones and followed by he Individual Microphone Segmenaion Merging () process; CLIPS+: he same process is applied using he CLIPS speaker segmenaion sysem followed by he process; Two axis merging: applicaion of he exper merging sraegy on he LIA and CLIPS segmenaions followed by he process. These resuls show: imporan differences in performance beween he LIA and CLIPS sysems on a same meeing file (e.g. 14.1% vs 53.4% for CMU_ on Dev corpus and 37.9% vs 19.1% for ICSI_ on Eva corpus); imporan differences in performance beween he meeings (e.g. 7.4% vs 54.1% for he LIA beween LDC_ and NIST_ on Dev corpus); a significan difference of performance beween Dev and Eva corpora (from 22.6% for he bes overall error rae on Eva vs 28.3% on Dev) as well as a differen behaviour of sysems beween corpora (LIA sysem is he bes one on Dev and CLIPS sysem he bes one on Eva);

4 a small performance improvemen observed wih he wo axis merging sraegy compared o he individual sysems, and only on few meeing files, (e.g. 25.3% for wo axis merging vs 28.4% for he LIA and 26.7% for he CLIPS for LDC_ ). Neverheless, no gain is reached on he overall performance, compared o he bes individual sysem. 4.3 Discussion According o he difficuly of he ask (compared o broadcas news or conversaional elephone daa), he performance obained by he various sysems is quie saisfying, especially on Eva corpus: 22.6% for he bes sysem, o be compared wih 12,88% 1 obained on BN daa during RT 03. Neverheless, he exper merging sraegy applied individually on each individual microphone ( wo axis merging ) does no provide addiional performance gain compared o he bes sysem. This resul differs from RT 03 ones [4] where a 16% relaive decrease of he diarizaion error was observed (from 16,90% for he bes individual sysem o 14,24% for he exper merging based sysem). Moreover, he behaviour of his sraegy grealy depends on he qualiy of individual segmenaions, when hemselves are dependen on he qualiy of each sream caugh by each individual microphone. One explanaion of he disappoining behaviour of he exper merging sraegy may be ha each exper is applied separaely on a missing daa file (i.e. on each individual microphone recording). Thus, he performance of he wo expers may be very differen for a same meeing file, which is a well known drawback in fusion (i is generally well acceped ha an efficien fusion mus be done beween expers ha have no oo large differences in erms of performance). Table 4 shows he differences beween he microphones aken independenly, on wo differen meeing examples 2. In he firs example (LDC_ ), he resul shows a large variabiliy in erms of speaker error raes beween he microphones (d3, d5, d6 ). Conrarily, regarding he speech/non speech deecion, a small variabiliy beween he microphones is noed. On his same meeing, he overall score is very close o he bes individual microphone resul, which performs quie well. The second example (NIST_ ) shows an inverse behaviour: comparable and quie reasonable speaker error raes over he se of microphones vs. high missed speech error raes wih a large variabiliy beween he microphones. The differences observed beween he meeings show he difficuly o define an efficien merging sraegy. To summarize, some commens could be proposed regarding he resuls: If one microphone is able o cach he informaion from all he speakers (d2, LDC_ for example), his microphone could be used alone achieving good performance (14,5% of diarizaion error on he previous example o be compared wih 12,88 % on BN daa); 2 Speaker diarizaion error raes provided in able 4 for each disan microphone are compued by mapping each individual microphone segmenaion o he corresponding single meeing reference segmenaion. 3 The speaker error rae is compued only on well deeced speech segmens (speech segmens presen boh in he reference and in he sysem oupu). If he informaion is presen simulaneously on differen microphones (wih differen signal qualiies), he fusion process is disurbed, since i is no able o group wo (or more) pars of a given speaker deeced on differen microphones ogeher; To ake advanage of he muliple microphones, i is necessary o focus on he useful informaion/speakers presen in each recording, i.e. he speech/non speech process should delee he far speakers (low SNR pars, background voices ). 5. CONCLUSION We have presened he ELISA speaker segmenaion approach applied on meeing speech daa for NIST RT 04s (spring) evaluaion campaign. The bes sysem obained 28.3% of diarizaion error on he developmen corpus (Dev) and 22.6% on he evaluaion corpus (Eva), o be compared wih he 12,88% obained on BN daa during NIST RT 03 evaluaion. A simple wo-axis merging sraegy was proposed o rea muliple exper segmenaion oupus and muliple microphone segmenaion oupus. While exper merging sraegy did no really lead o an improvemen of he performance, he individual microphone segmenaion merging sraegy allowed o provide a global segmenaion oupu from several audio channels (microphones) wih accepable performance. To be efficien when he speaker voices are differenly caugh by he microphones, our simple merging sraegy needs microphone independen segmenaions focused only on he well caugh speakers (he background/far speakers should be suppressed). Despie he simpliciy of he merging sraegy proposed in his paper, he ELISA primary sysem presened o he RT 04s (spring) meeing evaluaion obained he bes performance on he speaker diarizaion ask. 6. REFERENCES [1] hp:// [2] I. Magrin-Chagnolleau, G. Gravier, and R. Bloue for he ELISA consorium, Overview of he ELISA consorium research aciviies, A Speaker Odyssey, pp.67 72, Chania, Cree, June [3] D. Moraru, S. Meignier, L. Besacier, J.-F. Bonasre, and I. Magrin-Chagnolleau, The ELISA consorium approaches in speaker segmenaion during he NIST 2002 speaker recogniion evaluaion. ICASSP 03, Hong Kong. [4] D. Moraru, S. Meignier, C. Fredouille, L. Besacier, and J.- F. Bonasre, The ELISA consorium approaches in Broadcas News speaker segmenaion during he NIST 2003 Rich Transcripion evaluaion. ICASSP 04, Monreal, Canada, May [5] P. Delacour and C. Wellekens, DISTBIC: a speakerbased segmenaion for audio daa indexing, Speech Communicaion, Vol. 32, No. 1-2, Sepember [6] hp://nis.gov/speech/ess/r/r2004/spring/ [7] hp://nis.gov/speech/ess/r/r2004/spring/documens/r04 s-meeing-eval-plan-v1.pdf

5 Speaker diarizaion error (in %) Dev Meeing Corpus LIA+ CLIPS+ Two axis merging CMU_ CMU_ ICSI_ ICSI_ LDC_ LDC_ NIST_ NIST_ Overall (miss. and fa non speech err.=5.6%) Table 2: Performance (in erms of speaker diarizaion error rae) of individual speaker segmenaion sysems (LIA and CLIPS) applied on each disan microphones followed by Individual Microphone Segmenaion Merging () Sraegy and of wo axis merging sraegy based sysem. Performance given for each Dev corpus meeing signal and for he overall. Error raes (in %) LDC_ NIST_ Mis+fa Speaker Mis+fa Speaker err. rae Micro err. rae err. Rae err. rae d d d XX XX d d d d d XX XX Table 4: wo examples of Individual Microphone Segmenaion Merging () sraegy behaviour for he LIA+ sysem. Speaker diarizaion error (in %) Eva Meeing Corpus LIA+ CLIPS+ Two axis merging CMU_ CMU_ ICSI_ ICSI_ LDC_ LDC_ NIST_ NIST_ Overall (miss. and fa non speech err.=7%) Table 3: Performance (in erms of speaker diarizaion error rae) of individual speaker segmenaion sysems (LIA and CLIPS) applied on each disan microphones followed by Individual Microphone Segmenaion Merging () Sraegy and of wo axis merging sraegy based sysem. Performance given for each Eva corpus meeing signal and for he overall.

Neural Network Model of the Backpropagation Algorithm

Neural Network Model of the Backpropagation Algorithm Neural Nework Model of he Backpropagaion Algorihm Rudolf Jakša Deparmen of Cyberneics and Arificial Inelligence Technical Universiy of Košice Lená 9, 4 Košice Slovakia jaksa@neuron.uke.sk Miroslav Karák

More information

More Accurate Question Answering on Freebase

More Accurate Question Answering on Freebase More Accurae Quesion Answering on Freebase Hannah Bas, Elmar Haussmann Deparmen of Compuer Science Universiy of Freiburg 79110 Freiburg, Germany {bas, haussmann}@informaik.uni-freiburg.de ABSTRACT Real-world

More information

Fast Multi-task Learning for Query Spelling Correction

Fast Multi-task Learning for Query Spelling Correction Fas Muli-ask Learning for Query Spelling Correcion Xu Sun Dep. of Saisical Science Cornell Universiy Ihaca, NY 14853 xusun@cornell.edu Anshumali Shrivasava Dep. of Compuer Science Cornell Universiy Ihaca,

More information

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments Inernaional Journal of Elecrical and Compuer Engineering (IJECE) Vol. 6, No. 5, Ocober 2016, pp. 2415~2424 ISSN: 2088-8708, DOI: 10.11591/ijece.v6i5.10639 2415 An Effiecien Approach for Resource Auo-Scaling

More information

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices Z. Zhang e al.: Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion in Hands-Free Voice Conrolled Devices 525 Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion

More information

1 Language universals

1 Language universals AS LX 500 Topics: Language Uniersals Fall 2010, Sepember 21 4a. Anisymmery 1 Language uniersals Subjec-erb agreemen and order Bach (1971) discusses wh-quesions across SO and SO languages, hypohesizing:...

More information

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports Downloaded from ascelibrary.org by Basil Sephanis on 07/13/16. Copyrigh ASCE. For personal use only; all righs reserved. Informaion Propagaion for informing Special Populaion Subgroups abou New Ground

More information

MyLab & Mastering Business

MyLab & Mastering Business MyLab & Masering Business Efficacy Repor 2013 MyLab & Masering: Business Efficacy Repor 2013 Edied by Michelle D. Speckler 2013 Pearson MyAccouningLab, MyEconLab, MyFinanceLab, MyMarkeingLab, and MyOMLab

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION Mitchell McLaren 1, Yun Lei 1, Luciana Ferrer 2 1 Speech Technology and Research Laboratory, SRI International, California, USA 2 Departamento

More information

Investigation on Mandarin Broadcast News Speech Recognition

Investigation on Mandarin Broadcast News Speech Recognition Investigation on Mandarin Broadcast News Speech Recognition Mei-Yuh Hwang 1, Xin Lei 1, Wen Wang 2, Takahiro Shinozaki 1 1 Univ. of Washington, Dept. of Electrical Engineering, Seattle, WA 98195 USA 2

More information

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT Takuya Yoshioka,, Anton Ragni, Mark J. F. Gales Cambridge University Engineering Department, Cambridge, UK NTT Communication

More information

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language Z.HACHKAR 1,3, A. FARCHI 2, B.MOUNIR 1, J. EL ABBADI 3 1 Ecole Supérieure de Technologie, Safi, Morocco. zhachkar2000@yahoo.fr.

More information

WHEN THERE IS A mismatch between the acoustic

WHEN THERE IS A mismatch between the acoustic 808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS Elliot Singer and Douglas Reynolds Massachusetts Institute of Technology Lincoln Laboratory {es,dar}@ll.mit.edu ABSTRACT

More information

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method Sanket S. Kalamkar and Adrish Banerjee Department of Electrical Engineering

More information

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Navdeep Jaitly 1, Vincent Vanhoucke 2, Geoffrey Hinton 1,2 1 University of Toronto 2 Google Inc. ndjaitly@cs.toronto.edu,

More information

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition Seltzer, M.L.; Raj, B.; Stern, R.M. TR2004-088 December 2004 Abstract

More information

New Insights Into Hierarchical Clustering And Linguistic Normalization For Speaker Diarization

New Insights Into Hierarchical Clustering And Linguistic Normalization For Speaker Diarization New Insights Into Hierarchical Clustering And Linguistic Normalization For Speaker Diarization Simon BOZONNET A doctoral dissertation submitted to: TELECOM ParisTech in partial fulfillment of the requirements

More information

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment A thesis submitted in fulfillment of the requirements for the degree of Doctor of Philosophy Sheeraz Memon

More information

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction INTERSPEECH 2015 Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction Akihiro Abe, Kazumasa Yamamoto, Seiichi Nakagawa Department of Computer

More information

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Hua Zhang, Yun Tang, Wenju Liu and Bo Xu National Laboratory of Pattern Recognition Institute of Automation, Chinese

More information

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick

More information

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project Phonetic- and Speaker-Discriminant Features for Speaker Recognition by Lara Stoll Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California

More information

HIGHLIGHTS OF FINDINGS FROM MAJOR INTERNATIONAL STUDY ON PEDAGOGY AND ICT USE IN SCHOOLS

HIGHLIGHTS OF FINDINGS FROM MAJOR INTERNATIONAL STUDY ON PEDAGOGY AND ICT USE IN SCHOOLS HIGHLIGHTS OF FINDINGS FROM MAJOR INTERNATIONAL STUDY ON PEDAGOGY AND ICT USE IN SCHOOLS Hans Wagemaker Executive Director, IEA Nancy Law Director, CITE, University of Hong Kong SITES 2006 International

More information

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 Ranniery Maia 1,2, Jinfu Ni 1,2, Shinsuke Sakai 1,2, Tomoki Toda 1,3, Keiichi Tokuda 1,4 Tohru Shimizu 1,2, Satoshi Nakamura 1,2 1 National

More information

Support Vector Machines for Speaker and Language Recognition

Support Vector Machines for Speaker and Language Recognition Support Vector Machines for Speaker and Language Recognition W. M. Campbell, J. P. Campbell, D. A. Reynolds, E. Singer, P. A. Torres-Carrasquillo MIT Lincoln Laboratory, 244 Wood Street, Lexington, MA

More information

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012 Text-independent Mono and Cross-lingual Speaker Identification with the Constraint of Limited Data Nagaraja B G and H S Jayanna Department of Information Science and Engineering Siddaganga Institute of

More information

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One

More information

Data Fusion Models in WSNs: Comparison and Analysis

Data Fusion Models in WSNs: Comparison and Analysis Proceedings of 2014 Zone 1 Conference of the American Society for Engineering Education (ASEE Zone 1) Data Fusion s in WSNs: Comparison and Analysis Marwah M Almasri, and Khaled M Elleithy, Senior Member,

More information

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING Gábor Gosztolya 1, Tamás Grósz 1, László Tóth 1, David Imseng 2 1 MTA-SZTE Research Group on Artificial

More information

On the Formation of Phoneme Categories in DNN Acoustic Models

On the Formation of Phoneme Categories in DNN Acoustic Models On the Formation of Phoneme Categories in DNN Acoustic Models Tasha Nagamine Department of Electrical Engineering, Columbia University T. Nagamine Motivation Large performance gap between humans and state-

More information

LESSON: CHOOSING A TOPIC 2 NARROWING AND CONNECTING TOPICS TO THEME

LESSON: CHOOSING A TOPIC 2 NARROWING AND CONNECTING TOPICS TO THEME LESSON: CHOOSING A TOPIC 2 NARROWING AND CONNECTING TOPICS TO THEME Essential Questions: 1. How do topics in history relate to the History Day theme? 2. How do you make long histories concise? Objective:

More information

A study of speaker adaptation for DNN-based speech synthesis

A study of speaker adaptation for DNN-based speech synthesis A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,

More information

Preprint.

Preprint. http://www.diva-portal.org Preprint This is the submitted version of a paper presented at Privacy in Statistical Databases'2006 (PSD'2006), Rome, Italy, 13-15 December, 2006. Citation for the original

More information

ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS

ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS Annamaria Mesaros 1, Toni Heittola 1, Antti Eronen 2, Tuomas Virtanen 1 1 Department of Signal Processing Tampere University of Technology Korkeakoulunkatu

More information

Non intrusive multi-biometrics on a mobile device: a comparison of fusion techniques

Non intrusive multi-biometrics on a mobile device: a comparison of fusion techniques Non intrusive multi-biometrics on a mobile device: a comparison of fusion techniques Lorene Allano 1*1, Andrew C. Morris 2, Harin Sellahewa 3, Sonia Garcia-Salicetti 1, Jacques Koreman 2, Sabah Jassim

More information

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders

More information

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence INTERSPEECH September,, San Francisco, USA Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence Bidisha Sharma and S. R. Mahadeva Prasanna Department of Electronics

More information

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology ISCA Archive SUBJECTIVE EVALUATION FOR HMM-BASED SPEECH-TO-LIP MOVEMENT SYNTHESIS Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano Graduate School of Information Science, Nara Institute of Science & Technology

More information

Lab 1 - The Scientific Method

Lab 1 - The Scientific Method Lab 1 - The Scientific Method As Biologists we are interested in learning more about life. Through observations of the living world we often develop questions about various phenomena occurring around us.

More information

The Talent Development High School Model Context, Components, and Initial Impacts on Ninth-Grade Students Engagement and Performance

The Talent Development High School Model Context, Components, and Initial Impacts on Ninth-Grade Students Engagement and Performance The Talent Development High School Model Context, Components, and Initial Impacts on Ninth-Grade Students Engagement and Performance James J. Kemple, Corinne M. Herlihy Executive Summary June 2004 In many

More information

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION Han Shu, I. Lee Hetherington, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge,

More information

UTD-CRSS Systems for 2012 NIST Speaker Recognition Evaluation

UTD-CRSS Systems for 2012 NIST Speaker Recognition Evaluation UTD-CRSS Systems for 2012 NIST Speaker Recognition Evaluation Taufiq Hasan Gang Liu Seyed Omid Sadjadi Navid Shokouhi The CRSS SRE Team John H.L. Hansen Keith W. Godin Abhinav Misra Ali Ziaei Hynek Bořil

More information

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren Speech Technology and Research Laboratory, SRI International,

More information

Moderator: Gary Weckman Ohio University USA

Moderator: Gary Weckman Ohio University USA Moderator: Gary Weckman Ohio University USA Robustness in Real-time Complex Systems What is complexity? Interactions? Defy understanding? What is robustness? Predictable performance? Ability to absorb

More information

Individual Differences & Item Effects: How to test them, & how to test them well

Individual Differences & Item Effects: How to test them, & how to test them well Individual Differences & Item Effects: How to test them, & how to test them well Individual Differences & Item Effects Properties of subjects Cognitive abilities (WM task scores, inhibition) Gender Age

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Speech Communication Session 2aSC: Linking Perception and Production

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are: Every individual is unique. From the way we look to how we behave, speak, and act, we all do it differently. We also have our own unique methods of learning. Once those methods are identified, it can make

More information

COMM370, Social Media Advertising Fall 2017

COMM370, Social Media Advertising Fall 2017 COMM370, Social Media Advertising Fall 2017 Lecture Instructor Office Hours Monday at 4:15 6:45 PM, Room 003 School of Communication Jing Yang, jyang13@luc.edu, 223A School of Communication Friday 2:00-4:00

More information

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH 2009 423 Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition George

More information

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition Yanzhang He, Eric Fosler-Lussier Department of Computer Science and Engineering The hio

More information

Speaker Identification by Comparison of Smart Methods. Abstract

Speaker Identification by Comparison of Smart Methods. Abstract Journal of mathematics and computer science 10 (2014), 61-71 Speaker Identification by Comparison of Smart Methods Ali Mahdavi Meimand Amin Asadi Majid Mohamadi Department of Electrical Department of Computer

More information

Corpus Linguistics (L615)

Corpus Linguistics (L615) (L615) Basics of Markus Dickinson Department of, Indiana University Spring 2013 1 / 23 : the extent to which a sample includes the full range of variability in a population distinguishes corpora from archives

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Evaluation Report Output 01: Best practices analysis and exhibition

Evaluation Report Output 01: Best practices analysis and exhibition Evaluation Report Output 01: Best practices analysis and exhibition Report: SEN Employment Links Output 01: Best practices analysis and exhibition The report describes the progress of work and outcomes

More information

Tap vs. Bottled Water

Tap vs. Bottled Water Tap vs. Bottled Water CSU Expository Reading and Writing Modules Tap vs. Bottled Water Student Version 1 CSU Expository Reading and Writing Modules Tap vs. Bottled Water Student Version 2 Name: Block:

More information

How to Judge the Quality of an Objective Classroom Test

How to Judge the Quality of an Objective Classroom Test How to Judge the Quality of an Objective Classroom Test Technical Bulletin #6 Evaluation and Examination Service The University of Iowa (319) 335-0356 HOW TO JUDGE THE QUALITY OF AN OBJECTIVE CLASSROOM

More information

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions 26 24th European Signal Processing Conference (EUSIPCO) Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions Emma Jokinen Department

More information

Procedia - Social and Behavioral Sciences 191 ( 2015 ) WCES 2014

Procedia - Social and Behavioral Sciences 191 ( 2015 ) WCES 2014 Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 191 ( 2015 ) 323 329 WCES 2014 Assessing Students Perception Of E-Learning In Blended Environment: An Experimental

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Speech Recognition by Indexing and Sequencing

Speech Recognition by Indexing and Sequencing International Journal of Computer Information Systems and Industrial Management Applications. ISSN 215-7988 Volume 4 (212) pp. 358 365 c MIR Labs, www.mirlabs.net/ijcisim/index.html Speech Recognition

More information

Eyebrows in French talk-in-interaction

Eyebrows in French talk-in-interaction Eyebrows in French talk-in-interaction Aurélie Goujon 1, Roxane Bertrand 1, Marion Tellier 1 1 Aix Marseille Université, CNRS, LPL UMR 7309, 13100, Aix-en-Provence, France Goujon.aurelie@gmail.com Roxane.bertrand@lpl-aix.fr

More information

WHY SOLVE PROBLEMS? INTERVIEWING COLLEGE FACULTY ABOUT THE LEARNING AND TEACHING OF PROBLEM SOLVING

WHY SOLVE PROBLEMS? INTERVIEWING COLLEGE FACULTY ABOUT THE LEARNING AND TEACHING OF PROBLEM SOLVING From Proceedings of Physics Teacher Education Beyond 2000 International Conference, Barcelona, Spain, August 27 to September 1, 2000 WHY SOLVE PROBLEMS? INTERVIEWING COLLEGE FACULTY ABOUT THE LEARNING

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation AUTHORS AND AFFILIATIONS MSR: Xiaodong He, Jianfeng Gao, Chris Quirk, Patrick Nguyen, Arul Menezes, Robert Moore, Kristina Toutanova,

More information

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Jana Kitzmann and Dirk Schiereck, Endowed Chair for Banking and Finance, EUROPEAN BUSINESS SCHOOL, International

More information

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH Don McAllaster, Larry Gillick, Francesco Scattone, Mike Newman Dragon Systems, Inc. 320 Nevada Street Newton, MA 02160

More information

A Privacy-Sensitive Approach to Modeling Multi-Person Conversations

A Privacy-Sensitive Approach to Modeling Multi-Person Conversations A Privacy-Sensitive Approach to Modeling Multi-Person Conversations Danny Wyatt Dept. of Computer Science University of Washington danny@cs.washington.edu Jeff Bilmes Dept. of Electrical Engineering University

More information

Improvements to the Pruning Behavior of DNN Acoustic Models

Improvements to the Pruning Behavior of DNN Acoustic Models Improvements to the Pruning Behavior of DNN Acoustic Models Matthias Paulik Apple Inc., Infinite Loop, Cupertino, CA 954 mpaulik@apple.com Abstract This paper examines two strategies that positively influence

More information

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech Dept. for Speech, Music and Hearing Quarterly Progress and Status Report VCV-sequencies in a preliminary text-to-speech system for female speech Karlsson, I. and Neovius, L. journal: STL-QPSR volume: 35

More information

A heuristic framework for pivot-based bilingual dictionary induction

A heuristic framework for pivot-based bilingual dictionary induction 2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,

More information

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,

More information

Using 'intsvy' to analyze international assessment data

Using 'intsvy' to analyze international assessment data Oxford University Centre for Educational Assessment Using 'intsvy' to analyze international assessment data Professional Development and Training Course: Analyzing International Large-Scale Assessment

More information

Reduce the Failure Rate of the Screwing Process with Six Sigma Approach

Reduce the Failure Rate of the Screwing Process with Six Sigma Approach Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Reduce the Failure Rate of the Screwing Process with Six Sigma Approach

More information

Houghton Mifflin Online Assessment System Walkthrough Guide

Houghton Mifflin Online Assessment System Walkthrough Guide Houghton Mifflin Online Assessment System Walkthrough Guide Page 1 Copyright 2007 by Houghton Mifflin Company. All Rights Reserved. No part of this document may be reproduced or transmitted in any form

More information

Speaker recognition using universal background model on YOHO database

Speaker recognition using universal background model on YOHO database Aalborg University Master Thesis project Speaker recognition using universal background model on YOHO database Author: Alexandre Majetniak Supervisor: Zheng-Hua Tan May 31, 2011 The Faculties of Engineering,

More information

Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing

Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing Pallavi Baljekar, Sunayana Sitaram, Prasanna Kumar Muthukumar, and Alan W Black Carnegie Mellon University,

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

Semi-Supervised Face Detection

Semi-Supervised Face Detection Semi-Supervised Face Detection Nicu Sebe, Ira Cohen 2, Thomas S. Huang 3, Theo Gevers Faculty of Science, University of Amsterdam, The Netherlands 2 HP Research Labs, USA 3 Beckman Institute, University

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

ACTIVITY: Comparing Combination Locks

ACTIVITY: Comparing Combination Locks 5.4 Compound Events outcomes of one or more events? ow can you find the number of possible ACIVIY: Comparing Combination Locks Work with a partner. You are buying a combination lock. You have three choices.

More information

Abstractions and the Brain

Abstractions and the Brain Abstractions and the Brain Brian D. Josephson Department of Physics, University of Cambridge Cavendish Lab. Madingley Road Cambridge, UK. CB3 OHE bdj10@cam.ac.uk http://www.tcm.phy.cam.ac.uk/~bdj10 ABSTRACT

More information

Specification of the Verity Learning Companion and Self-Assessment Tool

Specification of the Verity Learning Companion and Self-Assessment Tool Specification of the Verity Learning Companion and Self-Assessment Tool Sergiu Dascalu* Daniela Saru** Ryan Simpson* Justin Bradley* Eva Sarwar* Joohoon Oh* * Department of Computer Science ** Dept. of

More information

SURVIVING ON MARS WITH GEOGEBRA

SURVIVING ON MARS WITH GEOGEBRA SURVIVING ON MARS WITH GEOGEBRA Lindsey States and Jenna Odom Miami University, OH Abstract: In this paper, the authors describe an interdisciplinary lesson focused on determining how long an astronaut

More information

Number of students enrolled in the program in Fall, 2011: 20. Faculty member completing template: Molly Dugan (Date: 1/26/2012)

Number of students enrolled in the program in Fall, 2011: 20. Faculty member completing template: Molly Dugan (Date: 1/26/2012) Program: Journalism Minor Department: Communication Studies Number of students enrolled in the program in Fall, 2011: 20 Faculty member completing template: Molly Dugan (Date: 1/26/2012) Period of reference

More information

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Alistair Conkie AT&T abs - Research 180 Park Avenue, Florham Park,

More information

Speaker Recognition. Speaker Diarization and Identification

Speaker Recognition. Speaker Diarization and Identification Speaker Recognition Speaker Diarization and Identification A dissertation submitted to the University of Manchester for the degree of Master of Science in the Faculty of Engineering and Physical Sciences

More information

Dates and Prices 2016

Dates and Prices 2016 Dates and Prices 2016 ICE French Language Courses www.ihnice.com 27, Rue Rossini - 06000 Nice - France Phone: +33(0)4 93 62 60 62 / Fax: +33(0)4 93 80 53 09 E-mail: info@ihnice.com 1 FRENCH COURSES - 2016

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Prof. Ch.Srinivasa Kumar Prof. and Head of department. Electronics and communication Nalanda Institute

More information

Bullying Prevention in. School-wide Positive Behaviour Support. Information from this presentation comes from: Bullying in schools.

Bullying Prevention in. School-wide Positive Behaviour Support. Information from this presentation comes from: Bullying in schools. Bullying Prevention in School-wide Positive Behaviour Support Carmen Poirier and Kent McIntosh University of British Columbia National Association of School Psychologists Convention March 5 th, 2010 Information

More information