Unsupervised Estimation for Noisy-Channel Models

Size: px
Start display at page:

Download "Unsupervised Estimation for Noisy-Channel Models"

Transcription

1 Markos Mylonakis Khalil Sima an Language and Compuaion, Universiy of Amserdam, Pl. Muidergrach 24, 1018TV Amserdam, Neherlands Rebecca Hwa Deparemen of Compuer Science, Universiy of Pisburgh, 210 S. Bouque S. Pisburgh, PA 15260, U.S.A. Absrac Shannon s Noisy-Channel model, which describes how a corruped message migh be reconsruced, has been he corner sone for much work in saisical language and speech processing. The model facors ino wo componens: a language model o characerize he original message and a channel model o describe he channel s corrupive process. The sandard approach for esimaing he parameers of he channel model is unsupervised Maximum-Likelihood of he observaion daa, usually approximaed using he Expecaion-Maximizaion (EM) algorihm. In his paper we show ha i is beer o maximize he join likelihood of he daa a boh ends of he noisy-channel. We derive a corresponding bi-direcional EM algorihm and show ha i gives beer performance han sandard EM on wo asks: (1) ranslaion using a probabilisic lexicon and (2) adapaion of a par-of-speech agger beween relaed languages. 1. Inroducion An influenial paradigm in saisical naural language processing (NLP) is he noisy-channel model (Shannon & Weaver, 1949). I describes a communicaion process in which a sender emis he inended message m hrough an imperfec communicaion channel such ha he sequence o observed by he recipien is a noisy version of he original message. To reconsruc m from o, one may posulae a se of hypoheses, H(o), and compue he opimal Bayesian Appearing in Proceedings of he 24 h Inernaional Conference on Machine Learning, Corvallis, OR, Copyrigh 2007 by he auhor(s)/owner(s). hypohesis, m = arg max m H(o) P (m o) = arg max m H(o) P (m)p (o m), where P (m) is called he language model and P (o m) he channel model. Many NLP problems can be framed in erms of he noisy-channel model. For example, in speech recogniion, o is an acousic uerance heard by he recipien and m is he speaker s inended message; in machine ranslaion, o is a senence expressed in a foreign language, s (source); m is he inended message expressed in he recipien s naive language, (arge); and he channel model is a probabilisic ranslaion lexicon (dicionary). A major challenge for raining he channel model for an NLP applicaion is ha he available daa rarely conains explici, in-deph mappings beween o and m. For insance, consider he problem of raining a channel model for machine ranslaion. While i may no be hard o find bilingual exs, he exs hemselves do no specify how individual words in he source language are ranslaed ino words in he arge language. Thus, he channel model P (o m) is usually explained by assuming a disribuion over a hidden ranslaion relaion, a m o, so ha P (o m) = a P (o, a m) (Bahl e al., 1990; Brown e al., 1988). The parameers for he model can be esimaed wih he Expecaion- Maximizaion (EM) algorihm (Baum e al., 1970; Dempser e al., 1977). However, his means ha he parameers are fied only o daa from one side of he channel: The language model parameers depend solely on daa from he message-side; and he channel model parameers are chosen o maximize he likelihood of he daa from he observable-side of he channel alone. Because of weak language models, asymmeric channel models and sparse-daa, his approach leads o differen esimaes from each direcion of he channel (P (m)p (o m) vs. P (o)p (m o)). Some recen work (Zens e al., 2004; Liang e al., 2006) suggess ha his could be subopimal in pracice and ha he wo direcions of he channel should be reconciled.

2 In his paper we explore mehods of maximizing he likelihood of boh he observable and message sides of he raining daa simulaneously. We propose ha he wo direcions of ranslaion, P (o) a P (a, m o) and P (m) a P (a, o m), employ he same se of join probabiliies P (o, a, m). This allows raining on he join daa of messages and observaions under Maximum-Likelihood. We exend he sandard EM algorihm ino a Bi-direcional EM (Bi-EM) algorihm for re-esimaing channel parameers. Unlike sandard NLP applicaion of he noisy-channel model, our algorihm does no depend on using a parallel corpus of messages and heir corresponding corruped observaions (m, o) as he raining daa; i is sufficien o have separae corpora of m and o. This is especially beneficial for machine ranslaion beween languages for which bilingual exs are no abundan. We presen experimens comparing Bi-EM wih he uni-direcional EM on wo asks (1) ranslaion from one language o anoher using a probabilisic ranslaion lexicon and wo monolingual corpora, and (2) auomaic adapaion of a par-of-speech (POS) agger from a language for which here exiss an annoaed raining corpus (wrien, Modern Sandard Arabic) o a relaed language (Spoken, Levanine dialec) for which here is only a small, unannoaed corpus of senences. On boh asks, and under varying raining condiions, he Bi-EM esimaes give beer sysem performance han sandard (unidirecional) EM. 2. Background and Relaed Work I is useful o hink of he noisy-channel problem as a ranslaion ask: he observaion o is he source language senence s and he message m is he arge language senence. While channel models (P (s )) can be implemened in many ways, in his paper we consider only a probabilisic ranslaion lexicon ha bridges he source (observaion) and arge (message) exs. This choice does no impac he generaliy of he esimaion algorihm presened, especially wih regard o applicaions such as machine ranslaion or speechrecogniion. Much work in Saisical Machine Translaion (SMT) has been devoed o he esimaion of lexicon probabiliies. We briefly review he relevan lieraure as a background agains which we presen our algorihm Translaion Probabiliies in SMT For a source senence s = (s 1,..., s n ) and a arge senence = ( 1,... m ), he objecive of SMT can be expressed in he noisy-channel framework as: arg max p( s) = arg max p(s )p(). To learn he ranslaion model, mos SMT approaches require a large parallel corpus (see e.g. (Brown e al., 1988; Koehn e al., 2003)) in order o induce a hidden alignmen a beween he words of each pair of senences s and : arg max p( s) = arg max p(s, a )p(). To esimae he word alignmen probabiliies and he lexicon probabiliies, mos work employs some form of he Expecaion-Maximizaion algorihm Baseline Model In conras wih work using parallel corpora, in (Koehn & Knigh, 2000) as well as in his paper, only monolingual corpora (in boh source and arge languages) are available. Because he wo corpora are no ranslaions of each oher, alignmens beween he pairs of senences by-and-large do no exis. Insead, we assume ha we are provided wih an ambiguous ranslaion lexicon L (which may be obained from a bilingual dicionary). For every source word s, L conains a se of ranslaions L(s), and vice versa (for arge word i conains a se L()). The goal is o esimae ranslaion probabiliies p(s ), he probabiliy ha a word ranslaes as word s L(), regardless of conex. Le he se L(s) sand for he se of all possible arge senences ha resul from ranslaing he (ordered) sequence of words in s, one by one 1, using lexicon L. Koehn and Knigh derive he following model 2 arg max p( s) L(s) = arg max L(s) p (s )p() θ (1) = arg max n θ (si i ) L(s) (2) a where θ sands for he ranslaion lexicon probabiliies s, i.e. p(s ). This model employs a language model p() over arge senences rained on he arge language monolingual corpus T, and a ranslaion model wih lexicon probabiliies θ (s i i ). Using fixed language model esimaes p(), he lexicon probabiliies are esimaed using EM over he source 1 Thereby assuming he same word-order and a one-oone mapping beween words, which also implies ha senence lengh is unchanged, i.e. m == n. 2 The noaion p θ (.) sands for he probabiliy under model (parameers) θ.

3 language corpus S. Assuming an iniial esimae θ 0 for θ, and denoe he esimae a ieraion r by θ r E-sep r : for every s S and L(s): q( s; θ r ) := 1 Z(s; θ r) p() n θ r (s i i ) M-sep r : maximize over θ o obain θ r+1 θ r+1 := arg max θ s S L(s) q( s; θ r ) log[ p()p θ (s )] where Z(s; θ r ) = L(s) p() n θ r (s i i ). The maximizaion a ieraion r (M-sep r ) is calculaed by relaive frequency esimaes as follows: θ r+1 (s ) = s S L(s) s S L(s) q( s; θ r ) j δ[s j, s]δ[ j, ] q( s; θ r ) j δ[ j, ] where δ[x, y] = 1 iff x == y, and zero oherwise. The acual implemenaion for Hidden Markov Models is known as he Baum-Welch or Forward-Backward algorihm (Baum e al., 1970) Exising Bi-direcional Mehods I has been observed in he SMT lieraure ha combining he alignmens esimaed from he wo possible direcions of ranslaion S T and S T improves he precision of he alignmen (Och & Ney, 2003). Reconciling he alignmens of he wo direcions of ranslaion culminaes in he mehod of (Zens e al., 2004). This mehod employs wo direcional ranslaion models, each wih a hidden direcional alignmen model and a word-o-word lexicon. The crucial observaion of Zens e al., shared wih our approach, is ha he condiional lexicon probabiliies can be compued using join esimaes (see equaions 4) from couns over he alignmens obained from eiher ranslaion direcion. Conrary o our approach, however, Zens e al. employ wo separae Uni-EM algorihms o consruc wo probabilisic direcional alignmens. Afer each ieraion of hese Uni-EM algorihms, each of he direcional alignmens is used for acquiring esimaes of he join couns for he lexicon word-pairs. These join couns are hen inerpolaed ogeher leading o symmerized lexicon probabiliy esimaes, which are in urn fed back ino each of he separae Uni-EM algorihms. I is unclear wha objecive funcion of he daa his mehod is opimizing. Furhermore, Zens e al. make unrealisic and unnecessary assumpions regarding he unigram couns in he wo corpora. Coming righ up o dae, (Liang e al., 2006) presen Alignmen by Agreemen : The key idea is o employ he parallel corpus S, T for he esimaion of wo alignmens θ and θ (he wo direcions of ranslaion) under an objecive likelihood funcion of S, T ha measures individual fi o he daa as well as muual agreemen beween hese alignmens: L(S, T ; θ ) L(S, T ; θ ) L(S, T ; Agr( θ, θ )) where L(X; θ) = x X p θ(x) sands for he likelihood of parallel corpus X (senence pairs) under he model ha employs alignmen θ, and Agr(a, b) measures he agreemen beween he wo alignmens a and b given x X as he do produc of wo probabiliy vecors ha range over all possible alignmens beween ha pair (also called se of generalized alignmens). While he idea of agreemen alignmen is appealing, i is by definiion no applicable in he presen case as we sar ou from a non-parallel corpus. Furhermore, because he lexicon is large (relaive o senence lengh), i is compuaionally prohibiive o employ he same measure of agreemen (such as do produc) beween he wo esimaes of probabiliies (per direcion) over he subses of he ranslaion lexicon (he power se of he lexicon). 3. Noisy-Channel Esimaors We sar ou from he inuiion ha he independen esimaion of he lexicon probabiliies p θ (s ) and p θ ( s) yields empirical esimaes ha do no agree on he join probabiliy p(s, ), i.e. p() p θ (s ) p(s) p θ ( s) This inequaliy is expeced due o he asymmeric saisics in T and S, asymmery in he ranslaion lexicon and weak language models. We hypohesize ha he noion of agreemen beween he wo models can be implemened by esimaion under he consrain ha consensus is achieved over his join probabiliy. A sraighforward approach would be o ake he weighed sum of he final EM esimaes obained over he wo ranslaion direcions (each conduced on is own): p(s, ) = λ p θ (s, ) + (1 λ) p θ (s, ) (3) where λ could be, e.g. he raio of corpora sizes. This leads o re-esimaes p θ (s ) = p(s, ) s p(s, ) p θ ( s) = p(s, ) p(s, ) (4) While inerpolaing he esimaes could be useful, we ake a novel approach ha aims a maximizing he

4 s Corpus S Image of T using lexicon L C(S)... L(s) Image of S using lexicon C(T) L Corpus T join-likelihood, jus in he same fashion he EM is obained from sandard maximum-likelihood. Le us define wo corpora C(S) and C(T ) (see figure 1): C(S) is he corpus ha consiss of a pair s, for every senence s S and every hypohesis L(s). Corpus C(T ) is defined analogously. Figure 2 shows he Bi-EM algorihm, L()... E-sep r : Figure 1. The concaenaion of complee source and arge corpora resuls in a single complee corpus. join-likelihood of he wo corpora under a join probabiliy model p θ (s, ) = n θ(s i, i ) which coordinaes wo inernally hidden condiional, direced ranslaion models ha are boh employing he same se of ranslaion parameers θ. Le p 1 (s) be a language model esimaed from S and analogously p 2 () from T, we rewrie he direcional ranslaion models in erms of a single se of lexicon parameers θ: arg max s arg max p(s ) = arg max s p( s) = arg max p θ (, s) p 1 (s) p θ(, s) p θ (, s) p 2 () s p θ(, s ) Saing he wo models in erms of he same se of join probabiliies of words implies ha he source and arge corpora are assumed o have been generaed from a single source: he join lexicon probabiliies. This allows us o sae a new objecive funcion, he Join-Likelihood of wo monolingual corpora: max L(T ; θ, p 1, L) L(S; θ, p 2, L) (5) θ L(X; θ, p k, ˆL) = L(x, y; θ, p k ) x X y ˆL(x) p θ (x, y) L(x, y; θ, p k ) = p k (y) x p θ(x, y) This saemen of he objecive funcion opimizes over θ he join-likelihood of wo monolingual corpora, each under is own likelihood funcion which involves he oher corpus. Crucially, he join-likelihood funcion has he same form as he usual likelihood funcion wih he minor difference ha he muliplicaion ranges over wo raher han one corpus (each under is own ranslaion direcion). In ligh of his observaion we can direcly obain a Bidirecional-EM algorihm ha aims a he s, C(T ): q 1 (s, ; θ r ) := p 1 (s) n s, C(S): q 2 (s, ; θ r ) := p 2 () n M-sep r : maximize over θ o obain θ r+1 θ r+1 := arg max θ + s, C(S) s, C(T ) θ r(s i, i) P θr(si,) θ r(s i, i) P s θr(s,i) A r(s,;θ) {}}{ q 1 (s, ; θ r ) Z 1 (s; θ r ) log L(s, ; θ, p 1) B r(s,;θ) {}}{ q 2 (s, ; θ r ) Z 2 (; θ r ) log L(, s; θ, p 2) L(x, y; θ, p) = p(x) n θ(x i,y i) P y θ(xi,y) Figure 2. Bi-EM algorihm where Z 1 (; θ r ) = s L() q1 (s, ; θ r ) and Z 2 (s; θ r ) = L(s) q2 (s, ; θ r ) are unigram coun esimaes. The sum of he wo sums in he M-sep can be rearranged ino a single sum if we precompue a single (complee) corpus C r ha concaenaes C(S) wih C(T ) and sores he expeced frequencies (A r (s, ; θ) or B r (s, ; θ)) wih each pair as { Ar (s, ; θ) s, C(T ) log freq r (s, ; θ) = B r (s, ; θ) s, C(S) The M-sep becomes he M-sep of a sandard EM algorihm: θ r+1 := arg max θ s, C r log freq r (s, ; θ) Hence, he Bi-EM inheris he properies of he common (Uni-direcional/Uni-) EM algorihm, including convergence and a guaranee of a choice of θ ha will no decrease he join-likelihood afer each ieraion.

5 The acual updae formula for he Bi-EM is: { q q(s, ; θ r ) = 1 (s, ; θ r ) s, C(T ) q 2 (s, ; θ r ) s, C(S) θ r+1 (s, ) = q(s, ; θ s, Cr r) j δ[s j, s]δ[ j, ] s, C r q(s, ; θ r ) Noe ha he Bi-EM akes only wice as much raining ime as he Uni-EM. 4. Implemenaion Deail The core of boh he Uni-EM esimaion mehods (Koehn & Knigh, 2000) and he presen Bi-EM esimaor is he Baum-Welch algorihm (Baum e al., 1970) for Hidden Markov Models (HMMs), which is known o be an EM algorihm (Dempser e al., 1977). This algorihm in is mos general form employs he Forward-Backward calculaions o updae expeced couns of ransiion (language model) and emission (lexicon) probabiliies. In our seing we fix he language model (ransiion) esimaes and reesimae only he lexicon (emission) probabiliies. This is because language models can be readily consruced from large monolingual daa and here is no reason o reesimae hem. For he generaion of he language models we used he CMU-Cambridge Toolki (Clarkson & Rosenfeld, 1997), employing a firs order Markov model. For he Baum-Welch algorihm, we implemened our own (Java) sofware package. Our sofware package 3 implemens boh he Uni- and Bi-EM algorihms. For POS-agging we employ he TnT agger (Brans, 2000) which works wih a 2nd-order HMM over POS ags and individual lexical (word-ag) probabiliies. 5. Applicaion I: Translaion Following (Koehn & Knigh, 2000), our experimens are on ranslaing noun sequences exraced from corpus senences. As an absolue baseline we employ a ranslaion model ha assumes uniform lexicon probabiliies (called LM mehod). The acual baseline, however, is he sandard EM (Koehn & Knigh, 2000) (subsequenly called Uni-EM Unidirecional EM). We compare hese baselines o he presen Bi-EM algorihm (secion 3). During raining, he inpu o he esimaion mehods consiss of a non-parallel English-German corpus pair and an ambiguous lexicon conaining up o seven Ger- 3 hp://saff.science.uva.nl/~mmylonak man ranslaions for every English word. 4 We iniialize he lexicon parameers wih a uniform disribuion boh for Uni- and Bi-EM. For evaluaion purposes, we embed he lexicon esimaes wihin a simple word-o-word ranslaion sysem (secion 2.2), and evaluae he ranslaion resul agains he ranslaions available in a given parallel corpus. As (Koehn & Knigh, 2000), we use Germano-English ranslaion. As a es corpus we use 5106 word ranslaion pairs from 1850 noun sequences exraced from an equal number of senences from he de-news 5, which have been aligned down o he word level. We measure accuracy, he fracion of words whose ranslaion maches he word used in he biex. In addiion, we also provide he BLEU scores (Papineni e al., 2001) as an addiional measure of ranslaion qualiy Effec of Domain Mismach The differen esimaors operae under domainand/or genre-mismach beween (1) source corpus, (2) arge corpus, (3) lexicon, and (4) es corpus. We fix he lexicon and he es corpus hroughou all experimens. Because he Bi-EM aims a he joinlikelihood of wo corpora, a quesion may arise as o wheher weakening he relaedness (in domain and/or genre) of he wo corpora will affec he performance of Bi-EM relaive o Uni-EM. Highly relaed The wo corpora here consis of noun sequences from wo non-overlapping secions of he Europarl (Koehn, 2005) parallel corpus (English- German). The baseline sysem using he LM mehod (uniform lexicon probabiliies) achieves an accuracy of 63.11% (BLEU score ). The following able lis Uni- vs. Bi-EM resuls: #senences Uni-EM Bi-EM 40K 72.01% (0.3896) 76.19% (0.4394) 75K 74.13% (0.4242) 77.34% (0.4660) 100K 74.99% (0.4300) 77.78% (0.4714) Compared agains he baseline (63.11% for he LM mehod) hese numbers improve by up o 15% (or in fac 40% error reducion). Bi-EM clearly ouperforms he sandard Uni-EM. I is eviden from he resuls ha he improved accuracy of he Bi-EM does no come from uilizing more daa. Bi-EM rained on 40,000 English and he same amoun of German 4 The lexicon was obained by auomaic word alignmen of he Europarl corpus. 5 hp:// publicaions/de-news/

6 senences significanly ouperforms Uni-EM rained on 100,000 English senences (and a German language model). This is a srong indicaion ha he Join- Likelihood is a beer objecive funcion han he likelihood of a single corpus. Less relaed We use as raining daa newspaper ex from he Gigaword (English) and from he European Language Newspaper Tex (German), uilizing news sories coming from he same agencies and published during he same period (Associaed Press, Agence France-Presse, May 1994-December 1995). Unlike differen secions of Europarl, his pair of corpora concerns news exs ha originae from nonparallel sources and are in wo differen languages. We esimae ranslaion probabiliies using Uni-EM and Bi-EM, raining wih 100K senences per language used: #senences Uni-EM Bi-EM 100K 70.29% (0.3610) 72.80% (0.3809) We noice again ha he Bi-EM helps produce significanly more accurae ranslaions. Ineresingly, raining Bi-EM on 100K senence sill gives beer resuls ha Uni-EM rained on 200K senences (Uni-EM wih 200K = 72.08% (0.3737)). Disanly relaed We also rained on a pair of disanly relaed corpora. These are he newspaper ex from Gigaword (English) and he parliamen proceedings from Europarl (German): #senences Uni-EM Bi-EM 100K 68.90% (0.3110) 70.98% (0.3303) Bi-EM is sill able o produce esimaes ha give more accurae ranslaions han Uni-EM. Furhermore, Bi- EM rained on 100K senences ouperforms Uni-EM rained on 200K senences (Uni-EM on 200K = 70.23% (0.3215)) Smaller Targe Language Daa We employ he corpora of secion 5.1, varying his ime he amoun of raining senences from he arge language (English), while mainaining a fixed raining corpus of 100K German senences (source). Figure 3 shows he average accuracies of Bi-EM as funcion of arge corpus increase. Noe ha he zero poin refers o he Bi-EM rained on arge corpus of size zero, which is equivalen o he Uni-EM. Ineresingly, 81% of he accuracy increase of Bi-EM relaive o Uni-EM is already obained by using only 25K senences, 77.32% (0.4542). These accuracies are averages over 3 differen non-overlapping ses of 25K English senences. Accuracy % #K english senences Figure 3. Bi-EM accuracy as arge corpus size increases 6. Applicaion II: Adaping Taggers Par-of-Speech (POS) agging is he ask of classifying every word in a ex ino one POS caegory (e.g., verb, noun). Many machine learning echniques have been applied o POS agging, including HMMs, Condiional Random Fields, Suppor Vecor Machines, Memory- Based Learning, jus o name a few (Ranaparkhi, 1996; Daelemans e al., 1996; Brans, 2000; Laffery e al., 2001). Here we focus on he POS agging of ranscrips of a spoken Levanine Arabic dialec. Unlike Modern Sandard Arabic (MSA), Arabic dialecs are spoken bu rarely ever wrien, which makes i virually impossible o obain MSA-dialec parallel corpora (see (Rambow e al., 2005)). Available is a manually agged MSA corpus (approx. 564K words) (Maamouri e al., 2004) and a iny, manually creaed ranslaion lexicon 6 ha maps words beween Levanine and MSA. Also available is a small Levanine corpus (approx. 31K words) consising of wo splis (18157 and words resp.). The ask here is o uilize he MSA agged corpus in order o auomaically POS ag he Levanine side using only unannoaed Levanine senences for raining and he lexicon for ranslaion. We embed he MSA POS agger and MSA-Levanine lexicon in he noisy-channel approach. Le m = m 1... m n be an MSA senence and l = l 1... l n be a Levanine senence. On he MSA side we have a POS ag sequence = 1... n associaed wih m. We have wo direcions for he noisy channel: P (m,, l) = P (m, )P (l m, ) (6) P (m,, l) = P (l)p (m, l), (7) 6 Originaing from JHU 2005 summer workshop hp: // The lexicon has 321 enries wih on average approx. 1.5 Levanine words per MSA word; If averaged over ambiguous MSA words only, he ambiguiy rises o 3.

7 Table 1. Adaping MSA POS agger o Levanine Adapaion Training Daa Accuracy None MSA only 70.48% Uni-EM MSA-o-Lev 75.93% Uni-EM Lev-o-MSA 77.88% Bi-EM MSA-and-Lev 78.25% where P (m, ) is an MSA POS agger, P (l) is a Levanine Language Model, and he oher wo erms are channel models involving he ranslaion lexicon in he wo direcions. The 2 nd -order HMM MSA POS agger and he Levanine language model are boh sandard: 7. P (m, ) = P (l) = n P ( i i 2, i 1 )P (m i i ) (8) n P (l i l i 1 ) (9) For equaion 8, we rain an off-he-shelf HMM POS agger (Brans, 2000) on he MSA daa (accuracy over 66K words es-se). We make wo srong assumpions (1) The Levanine POS agger differs from he MSA POS agger only in he lexical model, and (2) When a Levanine word ranslaes ino an MSA word-ag pair, he POS ag remains he same. The laer means ha we exend he MSA-Levanine lexicon from pairs m, l ino riples m,, l, where is any of he POS ags ha co-occur wih word m in he agged MSA corpus. A word found in boh corpora bu no in he lexicon is mapped o iself, and a word found in he Levanine bu no in he MSA corpus nor in he lexicon is mapped o all open caegory POS ags. For he wo Uni-EM versions, he channel probabiliy employs he probabilisic lexicon in wo direcions P (l m, ) = n θ (li m i, i ) and P (m, l) = n θ (mi, i l i ). For he Bi-EM we assume one (nondirecional/join) se of parameers θ(m,, l) ha underlies he wo direcional/condiional parameers as done wihin he ranslaion ask (secion 5). The esimae θ(m,, l) is Pconvered ino a Levanine lexical m model: P (l ) = θ(m,,l) P m,l θ(m,,l). This lexical model is used ogeher wih he 2 nd -order Markov Model over POS ags (rained on he MSA corpus) as a Levanine POS agger. Table 1 exhibis he resuls of he various POS ag- gers on he Levanine daa averaged over wo splis (he wo Lev pars). The firs row is he original MSA rained POS agger (70.48% accuracy = percenage of correcly agged es words). The second and hird rows correspond each o an adaped MSA POS agger using he Uni-EM esimaes from eiher ranslaion direcion. Depending on direcion, he Uni-EM achieves 18-25% less errors relaive o he unadaped agger. The Bi-EM adaped POS agger (las row) commis 2-10% less errors han he Uni-EM direcions (or abou 27.5% less errors han he MSA POS agger). Noe ha we have no included any exernal knowledge. In (Rambow e al., 2005), manual adapaion combined wih EM leads o 77-78% accuracy on a modified version 8 of he Levanine daa. On ha esmaerial, our experimens show ha he Bi-EM scores 82.30% accuracy (averaged over wo splis). We hink ha wo facors conribue o he fac ha he Bi-EM improves over Uni-EM: (1) I combines saisics from he MSA POS agger (one direcion) wih saisics from he Levanine language model (anoher direcion), and (2) Because he lexicon is asymmeric, Uni-EM updaes only hose enries used in he assumed direcion, whereas Bi-EM updaes he lexicon enries used in boh direcions. 7. Conclusions This paper aims a improved channel esimaes from daa a boh ends of he noisy-channel. We presened a Join Maximum-Likelihood approach and exended he EM algorihm ino a bi-direcional EM for unsupervised esimaion. We exemplified he uiliy of Bi-EM on wo asks: ranslaion by lexicon probabiliy esimaes and adapaion of a POS agger from a resource-rich o a resource-poor language. Bi-EM delivers beer resuls han he sandard EM regardless of mismach in domain or genre beween he source and arge corpora. In fuure work we aim a uilizing he Bi-EM for poring more linguisic processing ools from a resourcerich o a resource-poor language in cases where here exis no parallel corpora. We also hink ha he Bi- EM could be useful in saisical machine ranslaion, in paricular for obaining improved ranslaion model esimaes. Whenever he channel model (lexicon) is asymmeric and/or he language models are weak, i makes more sense o employ Bi-EM han sandard (Uni-direcional) EM for noisy-channel applicaions. 8 Cliics are marked wih disambiguaing symbols. 7 For breviy, any symbol x j where j 0 is assumed o be he unique sar symbol of a senence.

8 Acknowledgemens Preliminary Bi-EM versions were explored by he 2nd and 3rd auhors, ogeher wih Carol Nichols, during 2005 JHU Summer Language Engineering Workshop. We hank he JHU organizers for he opporuniy, he workshop paricipans for discussions and daa, he ICML reviewers for commens, Andy Way and Hermann Ney for poiners o relevan lieraure, and Aspasia Benei and Isaac Eseban for help wih preliminary experimens. The firs auhor is suppored by a NUFFIC HSP Huygens scholarship HSP-HP.06/940- G, and he second auhor by NWO gran number References Bahl, L. R., Jelinek, F., & Mercer, R. L. (1990). A maximum likelihood approach o coninuous speech recogniion, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. Baum, L., Peerie, T., Souled, G., & Weiss, N. (1970). A maximizaion echnique occurring in he saisical analysis of probabilisic funcions of markov chains. Ann. Mah. Sais., 41, Brans, T. (2000). Tn: a saisical par-of-speech agger. Proceedings of he sixh conference on Applied naural language processing (pp ). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. Brown, P., Cocke, J., Piera, S. D., Jelinek, F., Mercer, R., & Roossin, P. (1988). A saisical approach o language ranslaion. COLING-88. Clarkson, P., & Rosenfeld, R. (1997). Saisical language modeling using he cmu-cambridge oolki. Proceedings ESCA Eurospeech. Daelemans, W., Zavrel, J., Berck, P., & Gillis, S. (1996). Mb: A memory-based par of speech agger generaor. Proceedings of he fourh Workshop on Very Large Corpora (ACL SIGDAT) (pp ). Copenhagen, Denmark. Dempser, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplee daa via he em algorihm. Journal of he Royal Saisical Sociey, Series B, 39, Koehn, P. (2005). Europarl: A parallel corpus for saisical machine ranslaion. MT Summi. Koehn, P., Och, F. J., & Marcu, D. (2003). Saisical phrase-based ranslaion. Proceedings of he Human Language Technology Conference 2003 (HLT- NAACL 2003). Edmonon, Canada. Laffery, J., McCallum, A., & Pereira, F. (2001). Condiional random fields: Probabilisic models for segmening and labeling sequence daa. Proc. 18h Inernaional Conf. on Machine Learning (pp ). Morgan Kaufmann, San Francisco, CA. Liang, P., Taskar, B., & Klein, D. (2006). Alignmen by agreemen. Proceedings of he Human Language Technology Conference (HLT-NAACL 2006). New York. Maamouri, M., Bies, A., Buckwaler, T., & Mekki, W. (2004). The penn arabic reebank: Building a large-scale annoaed arabic corpus. Proceedings of NEMLAR Och, F. J., & Ney, H. (2003). A sysemaic comparison of various saisical alignmen models. Compuaional Linguisics, 29, Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2001). Bleu: a mehod for auomaic evaluaion of machine ranslaion. ACL 02: Proceedings of he 40h Annual Meeing on Associaion for Compuaional Linguisics (pp ). Morrisown, NJ, USA: Associaion for Compuaional Linguisics. Rambow, O., Chiang, D., Diab, M., Habash, N., Hwa, R., Sima an, K., Lacey, V., Levy, R., Nichols, C., & Shareef, S. (2005). Parsing arabic dialecs (Technical Repor). Johns Hopkins Universiy 2005 Summer Workshop on Language Engineering. Ranaparkhi, A. (1996). A maximum enropy model for par-of-speech agging. Proceedings of he Conference on Empirical Mehods in Naural Language Processing. Shannon, C. E., & Weaver, W. (1949). The mahemaical heory of communicaion. Urbana: Universiy of Illinois Press. Zens, R., Mausov, E., & Ney, H. (2004). Improved word alignmen using a symmeric lexicon model. Proceedings of he 20h Inernaional Conference on Compuaional Linguisics (CoLing) (pp ). Geneva, Swizerland. Koehn, P., & Knigh, K. (2000). Esimaing word ranslaion probabiliies from unrelaed monolingual corpora using he em algorihm. AAAI/IAAI.

Neural Network Model of the Backpropagation Algorithm

Neural Network Model of the Backpropagation Algorithm Neural Nework Model of he Backpropagaion Algorihm Rudolf Jakša Deparmen of Cyberneics and Arificial Inelligence Technical Universiy of Košice Lená 9, 4 Košice Slovakia jaksa@neuron.uke.sk Miroslav Karák

More information

1 Language universals

1 Language universals AS LX 500 Topics: Language Uniersals Fall 2010, Sepember 21 4a. Anisymmery 1 Language uniersals Subjec-erb agreemen and order Bach (1971) discusses wh-quesions across SO and SO languages, hypohesizing:...

More information

Fast Multi-task Learning for Query Spelling Correction

Fast Multi-task Learning for Query Spelling Correction Fas Muli-ask Learning for Query Spelling Correcion Xu Sun Dep. of Saisical Science Cornell Universiy Ihaca, NY 14853 xusun@cornell.edu Anshumali Shrivasava Dep. of Compuer Science Cornell Universiy Ihaca,

More information

More Accurate Question Answering on Freebase

More Accurate Question Answering on Freebase More Accurae Quesion Answering on Freebase Hannah Bas, Elmar Haussmann Deparmen of Compuer Science Universiy of Freiburg 79110 Freiburg, Germany {bas, haussmann}@informaik.uni-freiburg.de ABSTRACT Real-world

More information

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports Downloaded from ascelibrary.org by Basil Sephanis on 07/13/16. Copyrigh ASCE. For personal use only; all righs reserved. Informaion Propagaion for informing Special Populaion Subgroups abou New Ground

More information

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments Inernaional Journal of Elecrical and Compuer Engineering (IJECE) Vol. 6, No. 5, Ocober 2016, pp. 2415~2424 ISSN: 2088-8708, DOI: 10.11591/ijece.v6i5.10639 2415 An Effiecien Approach for Resource Auo-Scaling

More information

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices Z. Zhang e al.: Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion in Hands-Free Voice Conrolled Devices 525 Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion

More information

MyLab & Mastering Business

MyLab & Mastering Business MyLab & Masering Business Efficacy Repor 2013 MyLab & Masering: Business Efficacy Repor 2013 Edied by Michelle D. Speckler 2013 Pearson MyAccouningLab, MyEconLab, MyFinanceLab, MyMarkeingLab, and MyOMLab

More information

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels Jörg Tiedemann Uppsala University Department of Linguistics and Philology firstname.lastname@lingfil.uu.se Abstract

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

Language Model and Grammar Extraction Variation in Machine Translation

Language Model and Grammar Extraction Variation in Machine Translation Language Model and Grammar Extraction Variation in Machine Translation Vladimir Eidelman, Chris Dyer, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department

More information

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling Pratyush Banerjee, Sudip Kumar Naskar, Johann Roturier 1, Andy Way 2, Josef van Genabith

More information

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 Teresa Herrmann, Mohammed Mediani, Jan Niehues and Alex Waibel Karlsruhe Institute of Technology Karlsruhe, Germany firstname.lastname@kit.edu

More information

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS Julia Tmshkina Centre for Text Techitology, North-West University, 253 Potchefstroom, South Africa 2025770@puk.ac.za

More information

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Jung-Tae Lee and Sang-Bum Kim and Young-In Song and Hae-Chang Rim Dept. of Computer &

More information

Online Updating of Word Representations for Part-of-Speech Tagging

Online Updating of Word Representations for Part-of-Speech Tagging Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org

More information

Distant Supervised Relation Extraction with Wikipedia and Freebase

Distant Supervised Relation Extraction with Wikipedia and Freebase Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational

More information

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz

More information

A heuristic framework for pivot-based bilingual dictionary induction

A heuristic framework for pivot-based bilingual dictionary induction 2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Noisy SMS Machine Translation in Low-Density Languages

Noisy SMS Machine Translation in Low-Density Languages Noisy SMS Machine Translation in Low-Density Languages Vladimir Eidelman, Kristy Hollingshead, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department of

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION Han Shu, I. Lee Hetherington, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge,

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

The NICT Translation System for IWSLT 2012

The NICT Translation System for IWSLT 2012 The NICT Translation System for IWSLT 2012 Andrew Finch Ohnmar Htun Eiichiro Sumita Multilingual Translation Group MASTAR Project National Institute of Information and Communications Technology Kyoto,

More information

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

Greedy Decoding for Statistical Machine Translation in Almost Linear Time in: Proceedings of HLT-NAACL 23. Edmonton, Canada, May 27 June 1, 23. This version was produced on April 2, 23. Greedy Decoding for Statistical Machine Translation in Almost Linear Time Ulrich Germann

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each

More information

Memory-based grammatical error correction

Memory-based grammatical error correction Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,

More information

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches Yu-Chun Wang Chun-Kai Wu Richard Tzong-Han Tsai Department of Computer Science

More information

Lecture 10: Reinforcement Learning

Lecture 10: Reinforcement Learning Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation

More information

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Alistair Conkie AT&T abs - Research 180 Park Avenue, Florham Park,

More information

Constructing Parallel Corpus from Movie Subtitles

Constructing Parallel Corpus from Movie Subtitles Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing

More information

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly Inflected Languages Classical Approaches to Tagging The slides are posted on the web. The url is http://chss.montclair.edu/~feldmana/esslli10/.

More information

arxiv:cmp-lg/ v1 22 Aug 1994

arxiv:cmp-lg/ v1 22 Aug 1994 arxiv:cmp-lg/94080v 22 Aug 994 DISTRIBUTIONAL CLUSTERING OF ENGLISH WORDS Fernando Pereira AT&T Bell Laboratories 600 Mountain Ave. Murray Hill, NJ 07974 pereira@research.att.com Abstract We describe and

More information

Training and evaluation of POS taggers on the French MULTITAG corpus

Training and evaluation of POS taggers on the French MULTITAG corpus Training and evaluation of POS taggers on the French MULTITAG corpus A. Allauzen, H. Bonneau-Maynard LIMSI/CNRS; Univ Paris-Sud, Orsay, F-91405 {allauzen,maynard}@limsi.fr Abstract The explicit introduction

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2 Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2 Ted Pedersen Department of Computer Science University of Minnesota Duluth, MN, 55812 USA tpederse@d.umn.edu

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Ulrike Baldewein (ulrike@coli.uni-sb.de) Computational Psycholinguistics, Saarland University D-66041 Saarbrücken,

More information

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR ROLAND HAUSSER Institut für Deutsche Philologie Ludwig-Maximilians Universität München München, West Germany 1. CHOICE OF A PRIMITIVE OPERATION The

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation AUTHORS AND AFFILIATIONS MSR: Xiaodong He, Jianfeng Gao, Chris Quirk, Patrick Nguyen, Arul Menezes, Robert Moore, Kristina Toutanova,

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com

More information

CS 598 Natural Language Processing

CS 598 Natural Language Processing CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@

More information

An Efficient Implementation of a New POP Model

An Efficient Implementation of a New POP Model An Efficient Implementation of a New POP Model Rens Bod ILLC, University of Amsterdam School of Computing, University of Leeds Nieuwe Achtergracht 166, NL-1018 WV Amsterdam rens@science.uva.n1 Abstract

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February

More information

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,

More information

Cross-lingual Text Fragment Alignment using Divergence from Randomness

Cross-lingual Text Fragment Alignment using Divergence from Randomness Cross-lingual Text Fragment Alignment using Divergence from Randomness Sirvan Yahyaei, Marco Bonzanini, and Thomas Roelleke Queen Mary, University of London Mile End Road, E1 4NS London, UK {sirvan,marcob,thor}@eecs.qmul.ac.uk

More information

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger Page 1 of 35 Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger Kaihong Liu, MD, MS, Wendy Chapman, PhD, Rebecca Hwa, PhD, and Rebecca S. Crowley, MD, MS

More information

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment Akiko Sakamoto, Kazuhiko Abe, Kazuo Sumita and Satoshi Kamatani Knowledge Media Laboratory,

More information

Reinforcement Learning by Comparing Immediate Reward

Reinforcement Learning by Comparing Immediate Reward Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

BYLINE [Heng Ji, Computer Science Department, New York University,

BYLINE [Heng Ji, Computer Science Department, New York University, INFORMATION EXTRACTION BYLINE [Heng Ji, Computer Science Department, New York University, hengji@cs.nyu.edu] SYNONYMS NONE DEFINITION Information Extraction (IE) is a task of extracting pre-specified types

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

Multi-Lingual Text Leveling

Multi-Lingual Text Leveling Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency

More information

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers Chad Langley, Alon Lavie, Lori Levin, Dorcas Wallace, Donna Gates, and Kay Peterson Language Technologies Institute Carnegie

More information

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar

More information

Re-evaluating the Role of Bleu in Machine Translation Research

Re-evaluating the Role of Bleu in Machine Translation Research Re-evaluating the Role of Bleu in Machine Translation Research Chris Callison-Burch Miles Osborne Philipp Koehn School on Informatics University of Edinburgh 2 Buccleuch Place Edinburgh, EH8 9LW callison-burch@ed.ac.uk

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Discriminative Learning of Beam-Search Heuristics for Planning

Discriminative Learning of Beam-Search Heuristics for Planning Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

A Quantitative Method for Machine Translation Evaluation

A Quantitative Method for Machine Translation Evaluation A Quantitative Method for Machine Translation Evaluation Jesús Tomás Escola Politècnica Superior de Gandia Universitat Politècnica de València jtomas@upv.es Josep Àngel Mas Departament d Idiomes Universitat

More information

The KIT-LIMSI Translation System for WMT 2014

The KIT-LIMSI Translation System for WMT 2014 The KIT-LIMSI Translation System for WMT 2014 Quoc Khanh Do, Teresa Herrmann, Jan Niehues, Alexandre Allauzen, François Yvon and Alex Waibel LIMSI-CNRS, Orsay, France Karlsruhe Institute of Technology,

More information

Accurate Unlexicalized Parsing for Modern Hebrew

Accurate Unlexicalized Parsing for Modern Hebrew Accurate Unlexicalized Parsing for Modern Hebrew Reut Tsarfaty and Khalil Sima an Institute for Logic, Language and Computation, University of Amsterdam Plantage Muidergracht 24, 1018TV Amsterdam, The

More information

Corrective Feedback and Persistent Learning for Information Extraction

Corrective Feedback and Persistent Learning for Information Extraction Corrective Feedback and Persistent Learning for Information Extraction Aron Culotta a, Trausti Kristjansson b, Andrew McCallum a, Paul Viola c a Dept. of Computer Science, University of Massachusetts,

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Natural Language Processing. George Konidaris

Natural Language Processing. George Konidaris Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2017 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans

More information

Corpus Linguistics (L615)

Corpus Linguistics (L615) (L615) Basics of Markus Dickinson Department of, Indiana University Spring 2013 1 / 23 : the extent to which a sample includes the full range of variability in a population distinguishes corpora from archives

More information

In Workflow. Viewing: Last edit: 10/27/15 1:51 pm. Approval Path. Date Submi ed: 10/09/15 2:47 pm. 6. Coordinator Curriculum Management

In Workflow. Viewing: Last edit: 10/27/15 1:51 pm. Approval Path. Date Submi ed: 10/09/15 2:47 pm. 6. Coordinator Curriculum Management 1 of 5 11/19/2015 8:10 AM Date Submi ed: 10/09/15 2:47 pm Viewing: Last edit: 10/27/15 1:51 pm Changes proposed by: GODWINH In Workflow 1. BUSI Editor 2. BUSI Chair 3. BU Associate Dean 4. Biggio Center

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

Regression for Sentence-Level MT Evaluation with Pseudo References

Regression for Sentence-Level MT Evaluation with Pseudo References Regression for Sentence-Level MT Evaluation with Pseudo References Joshua S. Albrecht and Rebecca Hwa Department of Computer Science University of Pittsburgh {jsa8,hwa}@cs.pitt.edu Abstract Many automatic

More information

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

The College Board Redesigned SAT Grade 12

The College Board Redesigned SAT Grade 12 A Correlation of, 2017 To the Redesigned SAT Introduction This document demonstrates how myperspectives English Language Arts meets the Reading, Writing and Language and Essay Domains of Redesigned SAT.

More information

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES Judith Gaspers and Philipp Cimiano Semantic Computing Group, CITEC, Bielefeld University {jgaspers cimiano}@cit-ec.uni-bielefeld.de ABSTRACT Semantic parsers

More information

The Role of the Head in the Interpretation of English Deverbal Compounds

The Role of the Head in the Interpretation of English Deverbal Compounds The Role of the Head in the Interpretation of English Deverbal Compounds Gianina Iordăchioaia i, Lonneke van der Plas ii, Glorianna Jagfeld i (Universität Stuttgart i, University of Malta ii ) Wen wurmt

More information

Applications of memory-based natural language processing

Applications of memory-based natural language processing Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal

More information

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Transfer Learning Action Models by Measuring the Similarity of Different Domains Transfer Learning Action Models by Measuring the Similarity of Different Domains Hankui Zhuo 1, Qiang Yang 2, and Lei Li 1 1 Software Research Institute, Sun Yat-sen University, Guangzhou, China. zhuohank@gmail.com,lnslilei@mail.sysu.edu.cn

More information

Enhancing Morphological Alignment for Translating Highly Inflected Languages

Enhancing Morphological Alignment for Translating Highly Inflected Languages Enhancing Morphological Alignment for Translating Highly Inflected Languages Minh-Thang Luong School of Computing National University of Singapore luongmin@comp.nus.edu.sg Min-Yen Kan School of Computing

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

Some Principles of Automated Natural Language Information Extraction

Some Principles of Automated Natural Language Information Extraction Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract

More information

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design Paper #3 Five Q-to-survey approaches: did they work? Job van Exel

More information

Annotation Projection for Discourse Connectives

Annotation Projection for Discourse Connectives SFB 833 / Univ. Tübingen Penn Discourse Treebank Workshop Annotation projection Basic idea: Given a bitext E/F and annotation for F, how would the annotation look for E? Examples: Word Sense Disambiguation

More information

Semi-Supervised Face Detection

Semi-Supervised Face Detection Semi-Supervised Face Detection Nicu Sebe, Ira Cohen 2, Thomas S. Huang 3, Theo Gevers Faculty of Science, University of Amsterdam, The Netherlands 2 HP Research Labs, USA 3 Beckman Institute, University

More information

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer

More information