Memory-Bounded Left-Corner Unsupervised Grammar Induction on Child-Directed Input

Size: px
Start display at page:

Download "Memory-Bounded Left-Corner Unsupervised Grammar Induction on Child-Directed Input"

Transcription

1 Memory-Bounded Lef-Corner Unsupervised Grammar Inducion on Child-Direced Inpu Cory Shain The Ohio Sae Universiy William Bryce Universiy of Illinois a Urbana-Champaign bryce2@illinois.edu Lifeng Jin The Ohio Sae Universiy jin.544@osu.edu Vicoria Krakovna Harvard Universiy vkrakovna@fas.harvard.edu Finale Doshi-Velez Timohy Miller Harvard Universiy Boson Children s Hospial & finale@saes.harvard.edu Harvard Medical School imohy.miller@childrens.harvard.edu William Schuler The Ohio Sae Universiy schuler@ling.osu.edu Lane Schwarz Universiy of Illinois a Urbana-Champaign lanes@illinois.edu Absrac This paper presens a new memory-bounded lef-corner parsing model for unsupervised raw-ex synax inducion, using unsupervised hierarchical hidden Markov models (UHHMM). We deploy his algorihm o shed ligh on he exen o which human language learners can discover hierarchical synax hrough disribuional saisics alone, by modeling wo widely-acceped feaures of human language acquisiion and senence processing ha have no been simulaneously modeled by any exising grammar inducion algorihm: (1) a lef-corner parsing sraegy and (2) limied working memory capaciy. To model realisic inpu o human language learners, we evaluae our sysem on a corpus of child-direced speech raher han ypical newswire corpora. Resuls bea or closely mach hose of hree compeing sysems. 1 Inroducion The success of saisical grammar inducion sysems (Klein and Manning, 2002; Seginer, 2007; Ponver e al., 2011; Chrisodoulopoulos e al., 2012) seems o sugges ha sufficien saisical informaion is available in language o allow grammar acquisiion on his basis alone, as has been argued for word segmenaion (Saffran e al., 1999). Bu exising grammar inducion sysems make unrealisic assumpions abou human learners, such as he availabiliy of par-of-speech informaion and access o an indexaddressable parser char, which are no independenly cogniively moivaed. This paper explores he possibiliy ha a memory-limied incremenal lef-corner parser, of he sor independenly moivaed in senence processing heories (Gibson, 1991; Lewis and Vasishh, 2005), can sill acquire grammar by exploiing saisical informaion in child-direced speech. 2 Relaed Work This paper bridges work on human senence processing and synax acquisiion on he one hand and unsupervised grammar inducion (raw-ex parsing) on he oher. We discuss relevan lieraure from each of hese areas in he remainder of his secion. This work is licensed under a Creaive Commons Aribuion 4.0 Inernaional Licence. hp://creaivecommons.org/licenses/by/4.0/ Licence deails: 964 Proceedings of COLING 2016, he 26h Inernaional Conference on Compuaional Linguisics: Technical Papers, pages , Osaka, Japan, December

2 2.1 Human senence processing and synax acquisiion Relaed work in psycholinguisics and cogniive psychology has provided evidence ha humans have a limied abiliy o sore and rerieve srucures from working memory (Miller, 1956; Cowan, 2001; McElree, 2001), and may herefore employ a lef-corner-like sraegy during incremenal senence processing (Johnson-Laird, 1983; Abney and Johnson, 1991; Gibson, 1991; Resnik, 1992; Sabler, 1994; Lewis and Vasishh, 2005). Schuler e al. (2010) show ha nearly all naurally-occurring senences can be parsed using no more han four disjoin derivaion fragmens in a lef-corner parser, suggesing ha general-purpose working memory resources are all ha is needed o accoun for informaion sorage and rerieval during online senence processing. These findings moivae our lef-corner parsing sraegy and deph-bounded memory sore. An exensive lieraure indicaes ha memory abiliies develop wih age (see e.g. Gahercole, 1998 for a review). Newpor (1990) proposed ha limied processing abiliies acually faciliae language acquisiion by consraining he hypohesis space (he less-is-more hypohesis). This heory has been suppored by a number of subsequen compuaional and laboraory sudies (e.g, Elman, 1993; Goldowski & Newpor, 1993; Kareev e al., 1997) and parallels similar developmens in he curriculum learning raining regimen for machine learning (Bengio e al., 2009). 1 Research on he acquisiion of synax has shown ha infans are sensiive o synacic srucure (Newpor e al., 1977; Seidl e al., 2003) and ha memory limiaions consrain he learning of synacic dependencies (Sanelman and Jusczyk, 1998). Togeher, hese resuls sugges boh (1) ha he memory consrains in infans and young children are even more exreme han hose aesed for aduls and (2) ha hese consrains impac and may even faciliae learning. By implemening hese consrains in a domain-general compuaional model, we can explore he exen o which human learners migh exploi disribuional saisics during synax acquisiion (Lappin and Shieber, 2007). 2.2 Unsupervised grammar inducion The process of grammar inducion learns he synacic srucure of a language from a sample of unlabeled ex, raher han a gold-sandard reebank. The consiuen conex model (CCM) (Klein and Manning, 2002) uses expecaion-maximizaion (EM) o learn differences beween observed and unobserved brackeings, and he dependency model wih valence (DMV) (Klein and Manning, 2004) uses EM o learn disribuions ha generae child dependencies, condiioned on valence (lef or righ direcion) in addiion o he lexical head. Boh of hese algorihms induce on gold par-of-speech ag sequences. A number of successful unsupervised raw-ex synax inducion sysems also exis. Seginer (2007) (CCL) uses a non-probabilisic scoring sysem and a dependency-like synacic represenaion o bracke raw-ex inpu. Ponver e al. (2011) (UPPARSE) use a cascade of hidden Markov model (HMM) chunkers for unsupervised raw-ex parsing. Chrisodoulopoulos e al. (2012) (BMMM+DMV) induce par-of-speech (PoS) ags from raw ex using he Bayesian mulinomial mixure model (BMMM) of Chrisodoulopoulos e al. (2011), induce dependencies from hose ags using DMV, and ieraively re-ag and reparse using he induced dependencies as feaures in he agging process. In conras o ours, none of hese sysems employ a lef-corner parsing sraegy or model working memory limiaions. 3 Mehods Experimens described in his paper use a memory-bounded probabilisic sequence model implemenaion of a lef-corner parser (Aho and Ullman, 1972; van Schijndel e al., 2013) o deermine wheher naural language grammar can be acquired on he basis of saisics in ranscribed speech wihin humanlike memory consrains. The model assumes access o episodic memories of raining senences, bu imposes consrains on working memory usage during senence processing. The core innovaion of his paper is he adapaion of his processing model o Bayesian unsupervised inducion using consrained priors. 1 The less-is-more hypohesis has been a subjec of conroversy, however. See e.g. Rohde and Plau (2003) for a criical review. 965

3 a) S PRP VP We MD VP ll VP NP VB PRP ADJ NP ge you anoher one. b) = 1 = 2 = 3 = 4 = 5 S/VP VP/VP VP/VP VP/NP NP/NP VP/PRP We ll ge you anoher one. Figure 1: Trees and parial analyses for he senence We ll ge you anoher one, aken from he raining corpus. Derivaion fragmens are shown verically sacked beween words, using / o delimi op and boom signs. 3.1 Lef-corner parsing Lef-corner parsing is aracive as a senence processing model because i mainains a very small number of disjoin derivaion fragmens during processing (Schuler e al., 2010), in keeping wih human working memory limiaions (Miller, 1956; Cowan, 2001; McElree, 2001), and correcly predics difficuly in recognizing cener-embedded, bu no lef- or righ-embedded srucures (Chomsky and Miller, 1963; Miller and Isard, 1964; Karlsson, 2007). A lef-corner parser mainains a sequence of derivaion fragmens a/b, a /b,..., each consising of an acive caegory a lacking an awaied caegory b ye o come. I incremenally assembles rees by forking off and joining up hese derivaion fragmens, using a pair of binary decisions abou wheher o use a word w o sar a new derivaion fragmen (iniially a complee caegory c): 2 a/b w a/b c b + c... ; c w (F=1) a/b w c a = c; b w (F=0) and wheher o use a grammaical inference rule o connec a complee caegory c o a previously disjoin derivaion fragmen a/b: a/b c a/b b c b (J=1) a/b c a/b a /b b + a... ; a c b (J=0) These wo binary decisions have four possible oucomes in oal: he parser can fork only (which increases he number of derivaion fragmens by one), join only (which decreases he number of derivaion fragmens by one), boh fork and join (which keeps he number of derivaion fragmens he same), or neiher fork nor join (which also preserves he number of derivaion fragmens). An example derivaion of he senence We ll ge you anoher one, in shown in Figure Probabilisic sequence model A lef-corner parser can be modeled as a probabilisic sequence model using hidden random variables a every ime sep for Acive caegories A, Awaied caegories B, Preerminal or par-of-speech (POS) ags P, and an observed random variable W over Words. The model also makes use of wo binary swiching 2 Here, b + c... consrains c o be a lefmos descendan of b a some deph. 966

4 variables a each ime sep, F (for Fork) and J (for Join) ha guide he ransiions of he oher saes. These wo binary swiching variables yield four cases: 1/1, 1/0, 0/1 and 0/0 a each ime sep. Le D be he deph of he memory sore a posiion in he sequence, and le he sae q 1..D be he sack of derivaion fragmens a, consising of one acive caegory a d and one awaied caegory b d a each deph d. The join probabiliy of he hidden sae q 1..D and observed word w, given heir previous conex, are defined using Markov independence assumpions and he fork-join variable decomposiion of van Schijndel e al. (2013), which preserves PCFG probabiliies in incremenal senence processing: P(q 1..D w q 1..D 1.. w 1..) = P(q 1..D def = P(p w f j a 1..D = P θp (p q 1..D ) w q 1..D ) (1) P θw (w q 1..D p ) P θf ( f q 1..D p w ) P θj ( j q 1..D p w f ) P θa (a 1..D q 1..D p w f j ) P θb (b 1..D q 1..D b 1..D q 1..D ) (2) p w f j a 1..D ) (3) The par-of-speech p only depends on he lowes awaied (b d ) caegory a he previous ime sep, where d is he deph of he sack a he previous ime sep and q is an empy derivaion fragmen: P θp (p q 1..D ) def = P θp (p d b d ); d =max{q d d q } (4) The lexical iem (w ) only depends on he par of speech ag (p ) a he same ime sep: P θw (w q 1..D p ) def = P θw (w p ) (5) The fork decision f is assumed o be independen of previous sae q 1..D variables excep for he previous lowes awaied caegory b d and par of speech ag p : P θf ( f q 1..D p w ) def = P θf ( f d b d p ); d =max{q d d q } (6) The join decision j is decomposed ino fork and no-fork cases depending on he oucomes of he fork decision: P θj ( j q 1..D f p w ) def P θj ( j d a d = bd 1 ); d =max d {qd q } if f =0 P θj ( j d p b d ); d =max d {qd q (7) } if f =1 When f =1, ha is, a fork has been creaed, he decision of j is wheher o immediaely inegrae he newly forked derivaion fragmen and ransiion he awaied caegory above i ( j =1) or keep he newly forked derivaion fragmen ( j =0). When f =0, ha is, no fork has been creaed, he decision of j is wheher o reduce a sack level ( j =1) or o ransiion boh he acive and awaied caegories a he curren level ( j =0). Decisions abou he acive caegories a 1..D are decomposed ino fork- and join-specific cases depending on he previous sae q 1..D and he curren preerminal p. Since he fork and join oucomes only allow a single derivaion fragmen o be iniiaed or inegraed, each case of he acive caegory model only nondeerminisically modifies a mos one a d variable from he previous ime sep: 3 P θa (a 1..D q 1..D a 1..d 2 a 1..d 1 =a 1..d 2 =a 1..d 1 a 1..d 1 a 1..d 0 =a 1..d 0 f p w j ) def = a d 1 =a d 1 ad+0..d =a ; d =max d {q d q } if f =0, j =1 P θa (a d d bd 1 ad ) ad+1..d =a ; d =max d {q d q } if f =0, j =0 =a 1..d 1 a d =ad ad+1..d =a ; d =max d {q d q } if f =1, j =1 P θa (a d+1 d b d p ) a d+2..d =a ; d =max d {q d q } if f =1, j =0 3 Here φ is a (deerminisic) indicaor funcion, equal o one when φ is rue and zero oherwise. (8) 967

5 a 1 b 1 a 1 b 1 a 1 +1 b 1 +1 a 2 b 2 a 2 b 2 a 2 +1 b 2 +1 p f j p +1 f +1 j +1 w w +1 Figure 2: Graphical represenaion of probabilisic lef-corner parsing model expressed in Equaions 6 9 across wo ime seps, wih D = 2. Decisions abou he awaied caegories b 1..D also depend on he oucome of he fork and join variables. Again, since he fork and join oucomes only allow a single derivaion fragmen o be iniiaed or inegraed, each case of he awaied caegory model only nondeerminisically modifies a mos one b d variable from he previous ime sep: P θb (b 1..D b 1..d 2 b 1..d 1 b 1..d 1 b 1..d 0 q 1..D =b 1..d 2 f p w j a 1..D ) def = P θb (b d 1 d b d 1 ad ) bd+0..d =b ; d =max d {q d q } if f =0, j =1 =b 1..d 1 P θb (b d d ad a d ) bd+1..d =b ; d =max d {q d q } if f =0, j =0 =b 1..d 1 P θb (b d d bd p ) b d+1..d =b ; d =max d {q d q } if f =1, j =1 =b 1..d 0 P θb (b d+1 d a d+1 p ) b d+2..d =b ; d =max d {q d q } if f =1, j =0 (9) Thus, he parser has a fixed number of probabilisic decisions o make as i encouners each word, regardless of he deph of he sack. A graphical represenaion of his model is shown in Figure Model priors Inducion in his model follows he approach of Van Gael e al. (2008) by applying nonparameric priors over he acive, awaied, and par-of-speech variables. This approach allows he model o learn no only he parameers of he model such as wha pars of speech are likely o be creaed from wha awaied caegories bu also he cardinaliy of how many acive, awaied, and par of speech caegories are presen, in a fully unsupervised fashion. No labels are needed for inference, which alernaes beween inferring hese unseen caegories and he associaed model parameers. The probabilisic sequence model defined above, augmened wih priors, can be repeaedly sampled o obain an esimae of he poserior disribuion of is hidden variables given a se of observed word sequences. Priors over he synacic models are based on he infinie hidden Markov model (ihmm) used for par-of-speech agging (van Gael e al., 2009). In ha model, a hierarchical Dirichle process HMM (Teh e al., 2006) is used o allow he observed number of saes corresponding o pars of speech in he HMM o grow as he daa requires. The hierarchical srucure of he ihmm ensures ha ransiion disribuions share he same se of saes, which would no be possible if we used a fla infinie mixure model. A fully infinie version of his model uses nonparameric priors on each of he acive, awaied, and par-of-speech variables, allowing he cardinaliy of each of hese variables o grow as he daa requires. Each model draws a base disribuion from a roo Dirichle process, which is hen used as a parameer o an infinie se of Dirichle processes, one each for each applicable combinaion of he condiioning 968

6 variables a, b, p, j, f, a, and b : β A GEM(γ A ) (10) P θa (a d d b d 1 ad ) DP(α A, β A ) (11) P θa (a d+1 d b d p ) DP(α A, β A ) (12) β B GEM(γ B ) (13) P θb (b d 1 d b d 1 ad 1 ) DP(α B, β B ) (14) P θb (b d d a d a d ) DP(α B, β B ) (15) P θb (b d d b d p ) DP(α B, β B ) (16) P θb (b d+1 d a d+1 p ) DP(α B, β B ) (17) β P GEM(γ P ) (18) P θp (p d b d ) DP(α P, β P ) (19) where DP is Dirichle process and GEM is he sick-breaking consrucion for DPs (Sehuraman, 1994). Models a deph greaer han one use he corresponding model a he previous deph as a prior. 3.4 Inference Inference is based on he beam sampling approach employed in van Gael e al. (2009) for par-of-speech inducion. This inference approach alernaes beween wo phases in each ieraion. Firs, given he disribuions θ F, θ J, θ A, θ B, θ P, and θ W, he model resamples values for all he hidden saes {q d, p }. Nex, given he sae values {q d, p }, i resamples each se of mulinomial disribuions θ F, θ J, θ A, θ B, θ P, and θ W. The sampler is iniialized by conservaively seing he cardinaliies of he number of acive, awaied, and par-of-speech saes we expec o see in he daa se, randomly iniializing he sae space, and hen sampling he parameers for each disribuion θ F, θ J, θ A, θ B, θ P, and θ W given he randomly iniialized saes and fixed hyperparameers. As noed by Van Gael e al. (2008), oken-level Gibbs sampling in a sequence model can be slow o mix. Preliminary work found ha mixing wih oken-level Gibbs sampling is even slower in his model due o he igh consrains imposed by he swiching variables i is echnically ergodic bu exploring he sae space requires many low probabiliy moves. Therefore, he experimens described in his paper use senence-level sampling insead of oken-level sampling, firs compuing forward probabiliies for he sequence and hen doing sampling in a backwards pass; resampling he parameers for he probabiliy disribuions only requires compuing he couns from he sampled sequence and combining wih he hyperparameers. To accoun for he infinie size of he sae spaces, hese experimens employ he beam sampler (Van Gael e al., 2008), wih some modificaions for compuaional speed. The sandard beam sampler inroduces an auxiliary variable u a each ime sep, which acs as a hreshold below which ransiion probabiliies are ignored. This auxiliary variable u is drawn from Uniform(0, p(q 1..D q 1..D )), so i will be beween 0 and he probabiliy of he previously sampled ransiion. The join disribuion over ransiions, emissions, and auxiliary variables can be reduced so ha he ransiion marix is ransformed ino a boolean marix wih a 1 indicaing an allowed ransiion. Depending on he cu-off value u, he size of he insaniaed ransiion marix will be differen for every ime-sep. Values of u can be sampled for acive, awaied, and POS variables a every ime sep, raher han a single u for he ransiion marix. I is possible o compile all he operaions a each ime sep ino a single large ransiion marix, bu compuing his marix is prohibiively slow for an operaion ha mus be done a each ime sep in he daa. To address his issue, he learner may inerleave several ieraions holding he cardinaliy of he insaniaed space fixed wih full beam-sampling seps in which he cardinaliy of he sae space can change. 969

7 Figure 3: Log Probabiliy (wih punc) Figure 4: F-Score (wih punc) Figure 5: Deph=2 Frequency (wih punc) When he cardinaliy of he sae space is fixed, he learner can muliply ou he saes ino one large, srucured ransiion marix ha is valid for all ime seps. The forward pass is hus reduced o an HMM forward pass (albei one over a much larger se of saes), vasly improving he speed of inference. Alernaing beween sampling he parameers of his marix and he sae values hemselves corresponds o updaing a finie porion of he infinie possible sae space; by inerleaving hese finie seps wih occasional full beam-sampling ieraions, he learner is sill properly exploring he poserior over models. 3.5 Parsing There are muliple ways o exrac parses from an unsupervised grammar inducion sysem such as his. The opimal Bayesian approach would involve averaging over he values sampled for each model across many ieraions, and hen use hose models in a Vierbi decoding parser o find he bes parse for each senence. Alernaively, if he model parameers have ceased o change much beween ieraions, he learner can be assumed o have found a local opimum. I can hen use a single sample from he end of he run as is model and he analyses of each senence in ha run as he parses o be evaluaed. This laer mehod is used in he experimens described below. 4 Experimenal Seup We ran he UHHMM learner for 4,000 ieraions on he approximaely 14,500 child-direced uerances of he Eve secion of he Brown corpus from he CHILDES daabase (MacWhinney, 2000). 4 To model he limied memory capaciy of young language learners, we resriced he deph of he sore of derivaion fragmens o wo. 5 The inpu senences were okenized following he Penn Treebank convenion and convered o lower case. Puncuaion was iniially lef in he inpu as a proxy for inonaional phrasal cues (Seginer, 2007; Ponver e al., 2011), hen removed in a follow-up experimen. 4 We used 4 acive saes; 4 awaied saes; 8 pars of speech; and parameer values 0.5 for α a, α b, and α c, and 1.0 for α f, α j, and γ. The burnin period was 50 ieraions. 5 This limied sack deph permis discovery of ineresing synacic feaures like subjec-aux inversion while modeling he severe memory limiaions of infans (see 2.1). Greaer dephs are likely unnecessary o parse child-direced inpu (e.g., Newpor e al., 1977). 970

8 Wih punc No punc P R F1 P R F1 UPPARSE CCL BMMM+DMV (direced) BMMM+DMV (undireced) UHHMM-4000, binary UHHMM-4000, flaened Righ-branching Table 1: Parsing accuracy on Eve wih and wihou puncuaion (phrasal cues) in he inpu. The UHHMM sysems were given 8 PoS caegories while he BMMM+DMV sysems were given 45. UPPARSE and CCL do no learn PoS ags. Only he UHHMM sysems model limied working memory capaciy or incremenal lef-corner parsing. To generae accuracy benchmarks, we parsed he same daa se using he hree compeing rawex inducion sysems discussed in 2: CCL (Seginer, 2007), UPPARSE (Ponver e al., 2011), 6 and boh direced and undireced varians of BMMM+DMV (Chrisodoulopoulos e al., 2012). 7 The BMMM+DMV sysem generaes dependency graphs which are no direcly comparable o our phrasesrucure oupu, so we used he algorihm of Collins e al. (1999) o conver he BMMM+DMV oupu o he flaes phrase srucure rees permied by he dependency graphs. We evaluaed accuracy agains hand-correced gold-sandard Penn Treebank-syle annoaions for Eve (Pearl and Sprouse, 2013). All evaluaions were of unlabeled brackeings wih puncuaion removed. 8 Accuracy resuls repored for our sysem are exraced from arbirary samples aken afer convergence had been reached: ieraion 4000 for he wih-punc model, and ieraion 1500 for he no-punc model (see Figures 3 and 6, respecively). 5 Resuls Figures 3, 4, and 5 show (respecively) log probabiliy, f-score, and deph=2 frequency by ieraion for he UHHMM rained on daa conaining puncuaion. As he figures show, he model remains effecively deph 1 unil around ieraion 3000, a which poin i discovers deph 2, rapidly overgeneralizes i, hen scales back o around 350 uses over he enire corpus. Around his ime, parsing accuracy drops considerably. This resul is consisen wih he less-is-more hypohesis (Newpor, 1990), since accuracy decreases near he poin when he number of plausible hypoheses suddenly grows. In our sysem, we believe his is because he model reallocaes probabiliy mass o deeper parses. Noneheless, as we show below, final resuls are sae of he ar. We sampled parses from ieraion 4000 of our learner for evaluaion. As shown in Table 1, iniial accuracy measures are worse han all four compeiors. However, our sysem generaes exclusively binary-branching oupu, while all compeiors can produce he higher ariy rees aesed in he PTB-like evaluaion sandard (noice ha our recall measure for he binary branching oupu beas boh CCL and UPPARSE). To correc his disadvanage, we flaened he UHHMM oupu by firs convering binary rees o dependencies using a heurisic ha selecs for each paren he mos frequenly co-occurring child caegory as he head, hen convering hese dependencies back ino phrase srucures using he Collins e al. (1999) algorihm. As shown in Table 1, recall remains approximaely he same while precision predicably improves, resuling in higher overall F-measures ha bea or closely mach hose of all compeing sysems. 9 6 Using he bes cascaded parser seings from ha work: probabilisic righ-linear grammar wih uniform iniializaion. 7 We ran boh varians of he BMMM+DMV sysem for 10 generaions, wih 500 ieraions of BMMM and 20 EM ieraions of DMV per generaion, as was done by Chrisodoulopoulos e al. (2012). 8 Noe ha while puncuaion was removed for all evaluaions, inclusion/removal of puncuaion in he raining daa was an independen variable in our experimen. 9 I happens o be he case ha hese child-direced senences are heavily righ-branching, likely due o he simpliciy and 971

9 Figure 6: Log Probabiliy (no punc) Figure 7: F-Score (no punc) Figure 8: Deph=2 Frequency (no punc) Figures 6, 7, and 8 show (respecively) log probabiliy, f-score, and deph=2 frequency by ieraion for he UHHMM rained on daa conaining no puncuaion. Somewha surprisingly, he model discovers deph 2 and converges much more quickly han i did for he wih-punc corpus, requiring fewer han 1000 ieraions o converge. This is possibly due o he sligh reducion in corpus size. As in he case of he wih-punc rained learner, once deph 2 is discovered, he sysem quickly overgeneralizes, hen converges in a consisen range (in his case around 250 uses of deph 2). To evaluae accuracy on he puncuaion-free daa, we sampled parses from ieraion 1500 of our learner. Resuls are given in Table 1. Binary UHHMM resuls are on par wih UPPARSE, worse han CCL, and considerably worse han BMMM+DMV, while flaened UHHMM resuls show higher overall F-measures han boh CCL and UPPARSE. BMMM+DMV suffers less in he absence of puncuaion han he oher sysems (and herefore generally provides he bes inducion resuls on no-punc). The large drop in UHHMM accuracy wih he removal of puncuaion provides weak evidence for he use of inonaional phrasal cues in human synax acquisiion. While he BMMM+DMV resuls are on par wih ours, i is imporan o noe ha we used a severely resriced number of caegories in order o improve compuaional efficiency. For example, our sysem was given 8 PoS ags o work wih, while BMMM+DMV was given 45. Finer grained sae spaces in a more efficien implemenaion of our learner will hopefully improve upon he resuls presened here. Finally, i is ineresing o observe ha he uses of deph 2 shown in Figures 5 and 8 are in general linguisically well-moivaed. They end o occur in subjec-auxiliary inversion, diransiive, and conracion consrucions, in which deph 2 is ofen necessary in order o bracke auxiliary+subjec, verb+objec, and verb+conracion ogeher, as illusraed in Figure 9. Unforunaely, due o he fla represenaion of hese consrucions in he gold sandard rees, his insigh on he par of our learner is no refleced in he accuracy measures in Table 1. shor lengh of child-direced uerances, and herefore he righ-branching baseline (RB) ouperforms all sysems by a wide margin on his corpus. However, we argue ha such uerances are a more realisic model of inpu o human language learners han newswire ex, and herefore preferable for evaluaion of sysems ha purpor o model human language acquisiion. Our sysem learns his direcional bias from daa, and does so a leas as successfully as is compeiors. 972

10 1. Subjec-auxiliary inversion: ACT4 POS2 oh POS8, POS7 is AWA2 ACT4 POS1 rangy AWA1 POS3 sill AWA4 POS8 on AWA2 POS6 he AWA1 POS3 sep AWA4 POS8? 2. Diransiive: POS1 we ACT1 POS7 ll 3. Conracion: POS1 ha ACT1 POS7 s ACT2 POS7 ge POS6 a AWA3 ACT4 POS5 you AWA4 POS6 prey AWA1 AWA4 POS6 anoher ACT2 POS3 picure AWA4 POS3 one POS8, AWA4 POS8. ACT4 AWA2 POS7 is ACT1 AWA1 POS5 n POS5 i POS8? Figure 9: Acual parses from UHHMM-4000 (wih puncuaion), illusraing he use of deph 2 (bold) for subjec-aux inversion, diransiives, and conracions. 6 Conclusion This paper presened a grammar inducion syem ha models he working memory limiaions of young language learners and employs a cogniively plausible lef-corner incremenal parsing sraegy, in conras o exising raw-ex inducion sysems. The fac ha our sysem can model hese aspecs of human language acquisiion and senence processing while achieving he compeiive resuls shown here on a corpus of child-direced speech indicaes ha humans can in principle learn a good deal of naural language synax from disribuional saisics alone. I also shows ha modeling cogniion more closely can mach or improve on exising approaches o he ask of raw-ex grammar inducion. In fuure research, we inend o make use of parallel processing echniques o increase he speed of inference and (1) allow he sysem o infer he opimal number of saes in each componen of he model, permiing addiional granulariy ha migh enable i o discover subler paerns han is possible wih our currenly-resriced sae invenories, and (2) allow he sysem o make use of dephs 3 and 4, modeling working memory capaciies of older learners. Acknowledgemens The auhors would like o hank he anonymous reviewers for heir commens. This projec was sponsored by he Defense Advanced Research Projecs Agency award #HR The conen of he informaion does no necessarily reflec he posiion or he policy of he Governmen, and no official endorsemen should be inferred. 973

11 References Seven P. Abney and Mark Johnson Memory requiremens and local ambiguiies of parsing sraegies. J. Psycholinguisic Research, 20(3): Alfred V. Aho and Jeffery D. Ullman The Theory of Parsing, Translaion and Compiling, Vol. 1: Parsing. Prenice-Hall, Englewood Cliffs, New Jersey. Yoshua Bengio, Jérôme Louradour, Ronan Collober, and Jason Weson Curriculum learning. In Proceedings of he 26h Annual Inernaional Conference on Machine Learning, pages 41 48, Monreal. Noam Chomsky and George A. Miller Inroducion o he formal analysis of naural languages. In Handbook of Mahemaical Psychology, pages Wiley, New York, NY. Chrisos Chrisodoulopoulos, Sharon Goldwaer, and Mark Seedman A Bayesian mixure model for parof-speech inducion using muliple feaures. In Proceedings of EMNLP, pages , Edinburgh, Scoland, 7. Chrisos Chrisodoulopoulos, Sharon Goldwaer, and Mark Seedman Turning he pipeline ino a loop: Ieraed unsupervised dependency parsing and PoS inducion. In NAACL-HLT Workshop on he Inducion of Linguisic Srucure, pages 96 99, Monreal, Canada, 6. Michael Collins, Jan Hajic, Lance A. Ramshaw, and Chrisoph Tillman A saisical parser for Czech. In Proceedings of ACL. Nelson Cowan The magical number 4 in shor-erm memory: A reconsideraion of menal sorage capaciy. Behavioral and Brain Sciences, 24: Jeffrey L. Elman Learning and developmen in neural neworks: The imporance of saring small. Cogniion, 48: Susan E. Gahercole The developmen of memory. Journal of Child Psychology and Psychiary, 39:3 27. Edward Gibson A compuaional heory of human linguisic processing: Memory limiaions and processing breakdown. Ph.D. hesis, Carnegie Mellon. Boris Goldowsky and Elissa Newpor Modeling he effecs of processing limiaions on he acquisiion of morphology: he less is more hypohesis. In Jonahan Mead, edior, Proceedings of he 11h Wes Coas Conference on Formal Linguisics, pages Philip N. Johnson-Laird Menal models: Towards a cogniive science of language, inference, and consciousness. Harvard Universiy Press, Cambridge, MA, USA. Yakoov Kareev, Iris Lieberman, and Miri Lev Through a narrow window: Sample size and he percepion of correlaion. Journal of Experimenal Psychology, 126: Fred Karlsson Consrains on muliple cener-embedding of clauses. Journal of Linguisics, 43: Dan Klein and Chrisopher D. Manning A generaive consiuen-conex model for improved grammar inducion. In Proceedings of he 40h Annual Meeing of he Associaion for Compuaional Linguisics. Dan Klein and Chrisopher D. Manning Corpus-based inducion of synacic srucure: Models of dependency and consiuency. In Proceedings of he 42nd Annual Meeing of he Associaion for Compuaional Linguisics. Shalom Lappin and Suar M. Shieber Machine learning heory and pracice as a source of insigh ino universal grammar. Journal of Linguisics, 43:1 34. Richard L. Lewis and Shravan Vasishh An acivaion-based model of senence processing as skilled memory rerieval. Cogniive Science, 29(3): Brian MacWhinney The CHILDES projec: Tools for analyzing alk. Lawrence Elrbaum Associaes, Mahwah, NJ, hird ediion. Brian McElree Working memory and focal aenion. Journal of Experimenal Psychology, Learning Memory and Cogniion, 27(3): George A. Miller and Sephen Isard Free recall of self-embedded english senences. Informaion and Conrol, 7:

12 George A. Miller The magical number seven, plus or minus wo: Some limis on our capaciy for processing informaion. Psychological Review, 63: Elissa Newpor, Henry Gleiman, and Lila Gleiman Moher, I d raher do i myself: Some effecs and noneffecs of maernal speech syle. In Caherine F. Snow, edior, Talking o Children, pages Cambridge Universiy Press, Cambridge. Elissa Newpor Mauraional consrains on language learning. Cogniive Science, 14: Lisa Pearl and Jon Sprouse Synacic islands and learning biases: Combining experimenal synax and compuaional modeling o invesigae he language acquisiion problem. Language Acquisiion, 20: Elias Ponver, Jason Baldridge, and Karin Erik Simple unsupervised grammar inducion from raw ex wih cascaded finie sae models. In Proceedings of he 49h Annual Meeing of he Associaion for Compuaional Linguisics, pages , Porland, Oregon, 6. Philip Resnik Lef-corner parsing and psychological plausibiliy. In Proceedings of COLING, pages , Nanes, France. Douglas L.T. Rohde and David C. Plau Less is less in language acquisiion. In Philip Quinlan, edior, Connecionis modelling of cogniive developmen. Psychology Press, Hove, UK. Jenny R Saffran, Elizabeh K Johnson, Richard N Aslin, and Elissa L Newpor Saisical learning of one sequences by human infans and aduls. Cogniion, 70(1): Lynn Sanelman and Peer W. Jusczyk Sensiiviy o disconinuous dependencies in language learners: Evidence for limiaions in processing space. Cogniion, 69: William Schuler, Samir AbdelRahman, Tim Miller, and Lane Schwarz Broad-coverage incremenal parsing using human-like memory consrains. Compuaional Linguisics, 36(1):1 30. Yoav Seginer Fas unsupervised incremenal parsing. In Proceedings of he 45h Annual Meeing of he Associaion of Compuaional Linguisics, pages Amanda Seidl, George Hollich, and Peer W. Jusczyk Early undersanding of subjec and objec whquesions. Infancy, 4(3): Jayaram Sehuraman A consrucive definiion of Dirichle priors. Saisica Sinica, 4: Edward Sabler The finie conneciviy of linguisic srucure. In Perspecives on Senence Processing, pages Lawrence Erlbaum. Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei Hierarchical Dirichle processes. Journal of he American Saisical Associaion, 101(476): Jurgen Van Gael, Yunus Saaci, Yee Whye Teh, and Zoubin Ghahramani Beam sampling for he infinie hidden Markov model. In Proceedings of he 25h inernaional conference on Machine learning, pages ACM. Jurgen van Gael, Andreas Vlachos, and Zoubin Ghahramani The infinie HMM for unsupervised PoS agging. (Augus): Maren van Schijndel, Andy Exley, and William Schuler A model of language processing as hierarchic sequenial predicion. Topics in Cogniive Science, 5(3):

Neural Network Model of the Backpropagation Algorithm

Neural Network Model of the Backpropagation Algorithm Neural Nework Model of he Backpropagaion Algorihm Rudolf Jakša Deparmen of Cyberneics and Arificial Inelligence Technical Universiy of Košice Lená 9, 4 Košice Slovakia jaksa@neuron.uke.sk Miroslav Karák

More information

1 Language universals

1 Language universals AS LX 500 Topics: Language Uniersals Fall 2010, Sepember 21 4a. Anisymmery 1 Language uniersals Subjec-erb agreemen and order Bach (1971) discusses wh-quesions across SO and SO languages, hypohesizing:...

More information

More Accurate Question Answering on Freebase

More Accurate Question Answering on Freebase More Accurae Quesion Answering on Freebase Hannah Bas, Elmar Haussmann Deparmen of Compuer Science Universiy of Freiburg 79110 Freiburg, Germany {bas, haussmann}@informaik.uni-freiburg.de ABSTRACT Real-world

More information

Fast Multi-task Learning for Query Spelling Correction

Fast Multi-task Learning for Query Spelling Correction Fas Muli-ask Learning for Query Spelling Correcion Xu Sun Dep. of Saisical Science Cornell Universiy Ihaca, NY 14853 xusun@cornell.edu Anshumali Shrivasava Dep. of Compuer Science Cornell Universiy Ihaca,

More information

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments Inernaional Journal of Elecrical and Compuer Engineering (IJECE) Vol. 6, No. 5, Ocober 2016, pp. 2415~2424 ISSN: 2088-8708, DOI: 10.11591/ijece.v6i5.10639 2415 An Effiecien Approach for Resource Auo-Scaling

More information

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports Downloaded from ascelibrary.org by Basil Sephanis on 07/13/16. Copyrigh ASCE. For personal use only; all righs reserved. Informaion Propagaion for informing Special Populaion Subgroups abou New Ground

More information

MyLab & Mastering Business

MyLab & Mastering Business MyLab & Masering Business Efficacy Repor 2013 MyLab & Masering: Business Efficacy Repor 2013 Edied by Michelle D. Speckler 2013 Pearson MyAccouningLab, MyEconLab, MyFinanceLab, MyMarkeingLab, and MyOMLab

More information

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices Z. Zhang e al.: Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion in Hands-Free Voice Conrolled Devices 525 Channel Mapping using Bidirecional Long Shor-Term Memory for Dereverberaion

More information

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

Accurate Unlexicalized Parsing for Modern Hebrew

Accurate Unlexicalized Parsing for Modern Hebrew Accurate Unlexicalized Parsing for Modern Hebrew Reut Tsarfaty and Khalil Sima an Institute for Logic, Language and Computation, University of Amsterdam Plantage Muidergracht 24, 1018TV Amsterdam, The

More information

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque Approaches to control phenomena handout 6 5.4 Obligatory control and morphological case: Icelandic and Basque Icelandinc quirky case (displaying properties of both structural and inherent case: lexically

More information

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR ROLAND HAUSSER Institut für Deutsche Philologie Ludwig-Maximilians Universität München München, West Germany 1. CHOICE OF A PRIMITIVE OPERATION The

More information

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017 What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017 Supervised Training of Neural Networks for Language Training Data Training Model this is an example the cat went to

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

Context Free Grammars. Many slides from Michael Collins

Context Free Grammars. Many slides from Michael Collins Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

Online Updating of Word Representations for Part-of-Speech Tagging

Online Updating of Word Representations for Part-of-Speech Tagging Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org

More information

Unsupervised and Constrained Dirichlet Process Mixture Models for Verb Clustering

Unsupervised and Constrained Dirichlet Process Mixture Models for Verb Clustering Unsupervised and Constrained Dirichlet Process Mixture Models for Verb Clustering Andreas Vlachos Computer Laboratory University of Cambridge Cambridge CB3 0FD, UK av308l@cl.cam.ac.uk Anna Korhonen Computer

More information

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Ulrike Baldewein (ulrike@coli.uni-sb.de) Computational Psycholinguistics, Saarland University D-66041 Saarbrücken,

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

A deep architecture for non-projective dependency parsing

A deep architecture for non-projective dependency parsing Universidade de São Paulo Biblioteca Digital da Produção Intelectual - BDPI Departamento de Ciências de Computação - ICMC/SCC Comunicações em Eventos - ICMC/SCC 2015-06 A deep architecture for non-projective

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Distant Supervised Relation Extraction with Wikipedia and Freebase

Distant Supervised Relation Extraction with Wikipedia and Freebase Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational

More information

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class If we cancel class 1/20 idea We ll spend an extra hour on 1/21 I ll give you a brief writing problem for 1/21 based on assigned readings Jot down your thoughts based on your reading so you ll be ready

More information

Reinforcement Learning by Comparing Immediate Reward

Reinforcement Learning by Comparing Immediate Reward Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate

More information

Language Acquisition Chart

Language Acquisition Chart Language Acquisition Chart This chart was designed to help teachers better understand the process of second language acquisition. Please use this chart as a resource for learning more about the way people

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Constraining X-Bar: Theta Theory

Constraining X-Bar: Theta Theory Constraining X-Bar: Theta Theory Carnie, 2013, chapter 8 Kofi K. Saah 1 Learning objectives Distinguish between thematic relation and theta role. Identify the thematic relations agent, theme, goal, source,

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank Dan Klein and Christopher D. Manning Computer Science Department Stanford University Stanford,

More information

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

Linking the Ohio State Assessments to NWEA MAP Growth Tests * Linking the Ohio State Assessments to NWEA MAP Growth Tests * *As of June 2017 Measures of Academic Progress (MAP ) is known as MAP Growth. August 2016 Introduction Northwest Evaluation Association (NWEA

More information

A General Class of Noncontext Free Grammars Generating Context Free Languages

A General Class of Noncontext Free Grammars Generating Context Free Languages INFORMATION AND CONTROL 43, 187-194 (1979) A General Class of Noncontext Free Grammars Generating Context Free Languages SARWAN K. AGGARWAL Boeing Wichita Company, Wichita, Kansas 67210 AND JAMES A. HEINEN

More information

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Richard Johansson and Alessandro Moschitti DISI, University of Trento Via Sommarive 14, 38123 Trento (TN),

More information

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English. Basic Syntax Doug Arnold doug@essex.ac.uk We review some basic grammatical ideas and terminology, and look at some common constructions in English. 1 Categories 1.1 Word level (lexical and functional)

More information

Advanced Grammar in Use

Advanced Grammar in Use Advanced Grammar in Use A self-study reference and practice book for advanced learners of English Third Edition with answers and CD-ROM cambridge university press cambridge, new york, melbourne, madrid,

More information

Language properties and Grammar of Parallel and Series Parallel Languages

Language properties and Grammar of Parallel and Series Parallel Languages arxiv:1711.01799v1 [cs.fl] 6 Nov 2017 Language properties and Grammar of Parallel and Series Parallel Languages Mohana.N 1, Kalyani Desikan 2 and V.Rajkumar Dare 3 1 Division of Mathematics, School of

More information

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data Kurt VanLehn 1, Kenneth R. Koedinger 2, Alida Skogsholm 2, Adaeze Nwaigwe 2, Robert G.M. Hausmann 1, Anders Weinstein

More information

Semi-supervised Training for the Averaged Perceptron POS Tagger

Semi-supervised Training for the Averaged Perceptron POS Tagger Semi-supervised Training for the Averaged Perceptron POS Tagger Drahomíra johanka Spoustová Jan Hajič Jan Raab Miroslav Spousta Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics,

More information

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist Meeting 2 Chapter 7 (Morphology) and chapter 9 (Syntax) Today s agenda Repetition of meeting 1 Mini-lecture on morphology Seminar on chapter 7, worksheet Mini-lecture on syntax Seminar on chapter 9, worksheet

More information

Age Effects on Syntactic Control in. Second Language Learning

Age Effects on Syntactic Control in. Second Language Learning Age Effects on Syntactic Control in Second Language Learning Miriam Tullgren Loyola University Chicago Abstract 1 This paper explores the effects of age on second language acquisition in adolescents, ages

More information

CS 598 Natural Language Processing

CS 598 Natural Language Processing CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@

More information

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up

More information

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels Jörg Tiedemann Uppsala University Department of Linguistics and Philology firstname.lastname@lingfil.uu.se Abstract

More information

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,

More information

Learning Computational Grammars

Learning Computational Grammars Learning Computational Grammars John Nerbonne, Anja Belz, Nicola Cancedda, Hervé Déjean, James Hammerton, Rob Koeling, Stasinos Konstantopoulos, Miles Osborne, Franck Thollard and Erik Tjong Kim Sang Abstract

More information

Voices on the Web: Online Learners and Their Experiences

Voices on the Web: Online Learners and Their Experiences 2003 Midwest Research to Practice Conference in Adult, Continuing, and Community Education Voices on the Web: Online Learners and Their Experiences Mary Katherine Cooper Abstract: Online teaching and learning

More information

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One

More information

Development of Multistage Tests based on Teacher Ratings

Development of Multistage Tests based on Teacher Ratings Development of Multistage Tests based on Teacher Ratings Stéphanie Berger 12, Jeannette Oostlander 1, Angela Verschoor 3, Theo Eggen 23 & Urs Moser 1 1 Institute for Educational Evaluation, 2 Research

More information

Running head: DELAY AND PROSPECTIVE MEMORY 1

Running head: DELAY AND PROSPECTIVE MEMORY 1 Running head: DELAY AND PROSPECTIVE MEMORY 1 In Press at Memory & Cognition Effects of Delay of Prospective Memory Cues in an Ongoing Task on Prospective Memory Task Performance Dawn M. McBride, Jaclyn

More information

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics College Pricing Ben Johnson April 30, 2012 Abstract Colleges in the United States price discriminate based on student characteristics such as ability and income. This paper develops a model of college

More information

Measurement & Analysis in the Real World

Measurement & Analysis in the Real World Measurement & Analysis in the Real World Tools for Cleaning Messy Data Will Hayes SEI Robert Stoddard SEI Rhonda Brown SEI Software Solutions Conference 2015 November 16 18, 2015 Copyright 2015 Carnegie

More information

Some Principles of Automated Natural Language Information Extraction

Some Principles of Automated Natural Language Information Extraction Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract

More information

Developing a TT-MCTAG for German with an RCG-based Parser

Developing a TT-MCTAG for German with an RCG-based Parser Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

A Version Space Approach to Learning Context-free Grammars

A Version Space Approach to Learning Context-free Grammars Machine Learning 2: 39~74, 1987 1987 Kluwer Academic Publishers, Boston - Manufactured in The Netherlands A Version Space Approach to Learning Context-free Grammars KURT VANLEHN (VANLEHN@A.PSY.CMU.EDU)

More information

SOME MINIMAL NOTES ON MINIMALISM *

SOME MINIMAL NOTES ON MINIMALISM * In Linguistic Society of Hong Kong Newsletter 36, 7-10. (2000) SOME MINIMAL NOTES ON MINIMALISM * Sze-Wing Tang The Hong Kong Polytechnic University 1 Introduction Based on the framework outlined in chapter

More information

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014 UNSW Australia Business School School of Risk and Actuarial Studies ACTL5103 Stochastic Modelling For Actuaries Course Outline Semester 2, 2014 Part A: Course-Specific Information Please consult Part B

More information

SCHEMA ACTIVATION IN MEMORY FOR PROSE 1. Michael A. R. Townsend State University of New York at Albany

SCHEMA ACTIVATION IN MEMORY FOR PROSE 1. Michael A. R. Townsend State University of New York at Albany Journal of Reading Behavior 1980, Vol. II, No. 1 SCHEMA ACTIVATION IN MEMORY FOR PROSE 1 Michael A. R. Townsend State University of New York at Albany Abstract. Forty-eight college students listened to

More information

arxiv:cmp-lg/ v1 22 Aug 1994

arxiv:cmp-lg/ v1 22 Aug 1994 arxiv:cmp-lg/94080v 22 Aug 994 DISTRIBUTIONAL CLUSTERING OF ENGLISH WORDS Fernando Pereira AT&T Bell Laboratories 600 Mountain Ave. Murray Hill, NJ 07974 pereira@research.att.com Abstract We describe and

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form Orthographic Form 1 Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form The development and testing of word-retrieval treatments for aphasia has generally focused

More information

Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

Unsupervised Dependency Parsing without Gold Part-of-Speech Tags Unsupervised Dependency Parsing without Gold Part-of-Speech Tags Valentin I. Spitkovsky valentin@cs.stanford.edu Angel X. Chang angelx@cs.stanford.edu Hiyan Alshawi hiyan@google.com Daniel Jurafsky jurafsky@stanford.edu

More information

Grammars & Parsing, Part 1:

Grammars & Parsing, Part 1: Grammars & Parsing, Part 1: Rules, representations, and transformations- oh my! Sentence VP The teacher Verb gave the lecture 2015-02-12 CS 562/662: Natural Language Processing Game plan for today: Review

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Active Learning. Yingyu Liang Computer Sciences 760 Fall Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,

More information

Georgetown University at TREC 2017 Dynamic Domain Track

Georgetown University at TREC 2017 Dynamic Domain Track Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain

More information

Redirected Inbound Call Sampling An Example of Fit for Purpose Non-probability Sample Design

Redirected Inbound Call Sampling An Example of Fit for Purpose Non-probability Sample Design Redirected Inbound Call Sampling An Example of Fit for Purpose Non-probability Sample Design Burton Levine Karol Krotki NISS/WSS Workshop on Inference from Nonprobability Samples September 25, 2017 RTI

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

LTAG-spinal and the Treebank

LTAG-spinal and the Treebank LTAG-spinal and the Treebank a new resource for incremental, dependency and semantic parsing Libin Shen (lshen@bbn.com) BBN Technologies, 10 Moulton Street, Cambridge, MA 02138, USA Lucas Champollion (champoll@ling.upenn.edu)

More information

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410) JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD 21218. (410) 516 5728 wrightj@jhu.edu EDUCATION Harvard University 1993-1997. Ph.D., Economics (1997).

More information

Semantic and Context-aware Linguistic Model for Bias Detection

Semantic and Context-aware Linguistic Model for Bias Detection Semantic and Context-aware Linguistic Model for Bias Detection Sicong Kuang Brian D. Davison Lehigh University, Bethlehem PA sik211@lehigh.edu, davison@cse.lehigh.edu Abstract Prior work on bias detection

More information

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Proceedings of 28 ISFA 28 International Symposium on Flexible Automation Atlanta, GA, USA June 23-26, 28 ISFA28U_12 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Amit Gil, Helman Stern, Yael Edan, and

More information

Understanding the Relationship between Comprehension and Production

Understanding the Relationship between Comprehension and Production Carnegie Mellon University Research Showcase @ CMU Department of Psychology Dietrich College of Humanities and Social Sciences 1-1987 Understanding the Relationship between Comprehension and Production

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Jakub Waszczuk, Agata Savary To cite this version: Jakub Waszczuk, Agata Savary. Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]. PARSEME 6th general

More information

Task Tolerance of MT Output in Integrated Text Processes

Task Tolerance of MT Output in Integrated Text Processes Task Tolerance of MT Output in Integrated Text Processes John S. White, Jennifer B. Doyon, and Susan W. Talbott Litton PRC 1500 PRC Drive McLean, VA 22102, USA {white_john, doyon jennifer, talbott_susan}@prc.com

More information

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s)) Ohio Academic Content Standards Grade Level Indicators (Grade 11) A. ACQUISITION OF VOCABULARY Students acquire vocabulary through exposure to language-rich situations, such as reading books and other

More information

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy Informatics 2A: Language Complexity and the Chomsky Hierarchy September 28, 2010 Starter 1 Is there a finite state machine that recognises all those strings s from the alphabet {a, b} where the difference

More information

Abstractions and the Brain

Abstractions and the Brain Abstractions and the Brain Brian D. Josephson Department of Physics, University of Cambridge Cavendish Lab. Madingley Road Cambridge, UK. CB3 OHE bdj10@cam.ac.uk http://www.tcm.phy.cam.ac.uk/~bdj10 ABSTRACT

More information

Probability estimates in a scenario tree

Probability estimates in a scenario tree 101 Chapter 11 Probability estimates in a scenario tree An expert is a person who has made all the mistakes that can be made in a very narrow field. Niels Bohr (1885 1962) Scenario trees require many numbers.

More information

SEMAFOR: Frame Argument Resolution with Log-Linear Models

SEMAFOR: Frame Argument Resolution with Log-Linear Models SEMAFOR: Frame Argument Resolution with Log-Linear Models Desai Chen or, The Case of the Missing Arguments Nathan Schneider SemEval July 16, 2010 Dipanjan Das School of Computer Science Carnegie Mellon

More information

PIRLS. International Achievement in the Processes of Reading Comprehension Results from PIRLS 2001 in 35 Countries

PIRLS. International Achievement in the Processes of Reading Comprehension Results from PIRLS 2001 in 35 Countries Ina V.S. Mullis Michael O. Martin Eugenio J. Gonzalez PIRLS International Achievement in the Processes of Reading Comprehension Results from PIRLS 2001 in 35 Countries International Study Center International

More information

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN C O P i L cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN 2050-5949 THE DYNAMICS OF STRUCTURE BUILDING IN RANGI: AT THE SYNTAX-SEMANTICS INTERFACE H a n n a h G i b s o

More information