The Acquisition and Use of Context-Dependent Grammars for English

Size: px
Start display at page:

Download "The Acquisition and Use of Context-Dependent Grammars for English"

Transcription

1 The Acquisitio ad Use of Cotext-Depedet Grammars for Eglish Robert E Simmos* Uiversity of Texas Yeog-Ho Yu t Uiversity of Texas This paper itroduces a paradigm of cotext-depedet grammar (CDG) ad a acquisitio system that, through iteractive teachig sessios, accumulates the CDG rules. The resultig cotext-sesitive rules are used by a stack-based, shift~reduce parser to compute uambiguous sytactic structures of seteces. The acquisitio system ad parser have bee applied to the phrase structure ad case aalyses of 345 seteces, maily from ewswire stories, with 99% accuracy. Extrapolatio from our curret grammar predicts that about 25 thousad CDG rule examples will be sufficiet to trai the system i phrase structure aalysis of most ews stories. Overall, this research cocludes that CDG is a computatioally ad coceptually tractable approach for the costructio of setece grammar for large subsets of atural laguage text. 1. Itroductio A edurig goal for atural laguage processig (NLP) researchers has bee to costruct computer programs that ca read arrative, descriptive texts such as ewspaper stories ad traslate them ito kowledge structures that ca aswer questios, classify the cotet, ad provide summaries or other useful abstractios of the text. A essetial aspect of ay such NLP system is parsig--to traslate the idefiitely log, recursively embedded strigs of words ito defiite ordered structures of costituet elemets. Despite decades of research, parsig remais a difficult computatio that ofte results i icomplete, ambiguous structures; ad computatioal grammars for atural laguages remai otably icomplete. I this paper we suggest that a solutio to these problems may be foud i the use of cotext-sesitive rules applied by a determiistic shift/reduce parser. A system is described for rapid acquisitio of a cotext-sesitive grammar based o ordiary ews text. The resultig grammar is accessed by determiistic, bottomup parsers to compute phrase structure or case aalyses of texts that the grammars cove The acquisitio system allows a liguist to teach a CDG grammar by showig examples of parsig successive costituets of seteces. At this writig, 16,275 example costituets have bee show to the system ad used to parse 345 seteces ragig from 10 to 60 words i legth achievig 99% accuracy. These examples compress to a grammar of 3,843 rules that are equally effective i parsig. Extrapolatio from our data suggests that acquirig a almost complete phrase structure grammar for AP Wire text will require about 25,000 example rules. The procedure is further demostrated to apply directly to computig superficial case aalyses from Eglish seteces. Departmet of Computer Scieces, AI Lab, Uiversity of Texas, Austi TX t Boeig Helicopter Computer Svces, Philadelphia, PA (~) 1992 Associatio for Computatioal Liguistics

2 Computatioal Liguistics Volume 18, Number 4 Oe of the first lessos i atural or formal laguage aalysis is the Chomsky (1957) hierarchy of formal grammars, which classifies grammar forms from urestricted rewrite rules, through cotext-sesitive, cotext-free, ad the most restricted, regular grammars. It is usually coceded that pure, cotext-free grammars are ot powerful eough to accout for the sytactic aalysis of atural laguages (NL) such as Eglish, Japaese, or Dutch, ad most NL research i computatioal liguistics has used either augmeted cotext-flee or ad hoc grammars. The covetioal wisdom is that cotext-sesitive grammars probably would be too large ad coceptually ad computatioally utractable. There is also a uspoke suppositio that the use of a cotext-sesitive grammar implies usig the kid of complex parser required for parsig a fully cotext~sesitive laguage. However, NL research based o simulated eural etworks took a cotext-based approach. Oe of the first hits came from the strikig fidig from Sejowski ad Roseberg's NETtalk (1988), that seve-character cotexts were largely sufficiet to map each character of a prited word ito its correspodig phoeme---where each character actually maps i various cotexts ito several differet phoemes. For accomplishig liguistic case aalyses McClellad ad Kawamoto (1986) ad Miikkulaie ad Dyer (1989) used the etire cotext of phrases ad seteces to map strig cotexts ito case structures. Robert Alle (1987) mapped ie-word seteces of Eglish ito Spaish traslatios, ad Yu ad Simmos (1990) accomplished comparable cotext-sesitive traslatios betwee Eglish ad Germa simple seteces. It was apparet that the cotexts i which a word occurred provided iformatio to a eural etwork that was sufficiet to select correct word sese ad sytactic structure for otherwise ambiguous usages of laguage. I order to solve a problem of acceptig idefiitely log, complex seteces i a fixed-size eural etwork, Simmos ad Yu (1990) showed a method for traiig a etwork to act as a cotext-sesitive grammar. A sequetial program accessed that grammar with a determiistic, sigle-path parser ad accurately parsed descriptive texts. Cotiuig that research, 2,000 rules were accumulated ad a etwork was traied usig a back-propagatio method. The traiig of this etwork required te days of cotiuous computatio o a Symbolics Lisp Machie. We observed that the traiig cost icreased by more tha the square of the umber of traiig examples ad calculated that 10,000-20,000 rules might well tax a supercomputer. So we decided that storig the grammar i a hash table would form a far less expesive optio, provided we could defie a selectio algorithm comparable to that provided by the traied eural etwork. I this paper we describe such a selectio formula to select rules for cotextsesitive parsig, a system for acquirig cotext-sesitive rules, ad experimets i aalysis ad applicatio of the grammar to ordiary ewspaper text. We show that the applicatio of cotext-sesitive rules by a determiistic shift/reduce parser is a coceptually ad computatioally tractable approach to NLP that may allow us to accumulate practical grammars for large subsets of Eglish texts. 2. Cotext-Depedet Parsig I NL research most iterest has cetered o cotext-free grammars (CFG), augmeted with feature tests ad trasformatios, used to describe the phrase structure of seteces. There is a broad literature o Geeralized Phrase Structure Grammar (Gazdar et al. 1985), Uificatio Grammars of various types (Shieber 1986), ad Augmeted 392

3 Robert E Simmos ad Yeog-Ho Yu Cotext-Depedet Grammars for Eglish Trasitio Networks (J. Alle 1987). Gazdar (1988) calls attetio to a subcategory of cotext-sesitive grammars called idexed laguages ad illustrates some applicability to atural laguages, ad Joshi illustrates a applicatio of "mild cotext-sesitivity" (Joshi 1987), but i geeral, NL computatio with cotext-sesitive grammars is a largely uexplored area. While a few advaced NLP laboratories have developed grammars ad parsig capabilities for sigificatly large subsets of atural laguage, 1 it caot be deied that massive effort was required ad that the results are plagued by ambiguous iterpretatios. These grammars are typically a cotext-free form, augmeted by complex feature tests, trasformatios, ad occasioally, arbitrary programs. The combiatio of eve a efficiet parser with such itricate grammars may greatly icrease computatioal complexity of the parsig system (Tomita 1985). It is extremely difficult to write ad maitai such grammars, ad they must frequetly be revised ad retested to esure iteral cosistecy as ew rules are added. We argue here that a acquisitio system for accumulatig cotext-sesitive rules ad their applicatio by a determiistic shift/reduce parser will greatly simplify the process of costructig ad maitaiig atural laguage parsig systems. Although we use cotext-sesitive rules of the form uxv ~ uyv they are iterpreted by a shift/reduce parser with the result that they ca be applied successfully to the LR(k) subset of cotext-free laguages. Uless the parser is augmeted to iclude shifts i both directios, the system caot parse cotext-sesitive laguages. It is a ope questio as to whether Eglish is or is ot cotext-sesitive, but it defiitely icludes discotiuous costituets that may be separated by idefiitely may symbols. For this reaso, future developmets of the system may require operatios beyod shift ad reduce i the parser. To avoid the easy misiterpretatio that our preset system applies to cotext-sesitive laguages, we call it Cotext- Depedet Grammar (CDG). We begi with the simple otio of a shift/reduce parser. Give a stack ad a iput strig of symbols, the shift/reduce parser may oly shift a symbol to the stack (Figure la) or reduce symbols o the stack by rewritig them as a sigle symbol (Figure lb). We further costrai the parser to reduce o more tha two symbols o the stack to a sigle symbol. The parsig termiates whe the stack cotais oly a sigle root elemet ad the iput strig is empty. Usually this class of parser applies a CFG to a setece, but it is equally applicable to CDG. 2.1 CDG Rule Forms The theoretical viewpoit is that the parse of a setece is a sequece of states, each composed of a coditio of the stack ad the iput strig. The sequece eds successfully whe the stack cotais oly the root elemet (e.g. SNT), ad the iput strig is 1 Notable examples iclude the large augmeted CFGs at IBM Yorktow Hts, the Uiv. of Pesylvaia, ad the Liguistic Research Ctr. at the Uiv. of Texas. 393

4 Computatioal Liguistics Volume 18, Number 4 INPUT SENTENCE t._i Ci+l t i+2... INPUT SENTENCE t.i+l t 1+2 f_i+$... t. t Nrk t_l NT._~, t l STACK STACK bottom bottom INPUT SENTENCE t i,t m,.~,t I are termials. N'T_'~ is a ~o-termial_ (a) Shift Operatio INPUT SENTENCE t i t i+1 t_i+2... t_i t_i+l t i+2... t.m A~'_t a~rd U STACK STACK bottom bottom t i,t m,...,t I are termials. N'T_~, NT~ are o-termials. (b) Reduce Operatio Figure 1 Shift/reduce parser. empty. Each state ca be see as the left half of a cotext-sesitive rule whose right half is the succeedig state. stacksiputs ~ stacks+ l iputs+ l However, seteces may be of ay legth ad are ofte more tha forty words, so the resultig strigs ad stacks would form very cumbersome rules of variable legths. To avoid this difficulty, the stack ad iput parts of a rule are limited to five symbols each. I the followig example the stack ad iput parts are separated by the symbol "*/' as the idea is applied to the setece "The old ma from Spai ate fish." The symbol _ stads for blak, art for article, adj for adjective, p for prepositio, for ou, ad v for verb. The sytactic classes are assiged by dictioary lookup i a cotext-sesitive dictioary

5 Robert F. Simmos ad Yeog-Ho Yu Cotext-Depedet Grammars for Eglish The old ma from Spai ate fish art adj p v * art adj p art * adj p v _ art adj * p v art adj * p v art p * p v _.... p * p v p p* v p p * v p pp v * _.... p * v p v * p v * _ p vp * st * The aalysis termiates with a empty iput strig ad the sigle symbol "st" o the stack, successfully completig the parse. Note that the first four operatios ca be described as shifts followed by the two reductios, adj ~ p, ad art p ~ p. Subsequetly the p ad were shifted oto the stack ad the reduced to a pp; the the p ad pp o the stack were reduced to a p, followed by the shiftig of v ad, their reductio to vp, ad a fial reductio of p vp ---* st. Illustratios similar to this are ofte used to itroduce the cocept of parsig i AI texts o atural laguage (e.g.j. Alle 1987). We could perfectly well record the grammar i pairs of successive states as follows: p p* v --* p p * v p p * v 7 p pp* v but some ecoomy ca be achieved by recordig the operatio ad possible label as the right half of a rule. So for the example immediately above, we record: _ p p * v --+ (S) p p * v _ - - * ( R p p ) where S shifts ad (R pp) replaces the top two elemets of the stack with pp to form the ext state of the parse. Thus a widowed cotext of te symbols is created as the left half of a rule ad a operatio as the right half. Note that if the stack were limited to the top two elemets, ad the iput to a sigle elemet, the rule system would reduce to a biary rule CFG. The example i Figure 2 shows how a setece "Treatmet is a complete rest ad a special diet" is parsed by a cotext sesitive shift/reduce parser. Termial symbols are lowercase, while otermials are uppercase. The shaded areas represet the parts 2 Described i Sectio

6 Computatioal Liguistics Volume 18, Number 4 N!i iiiiiiiiiii!iliiii i~ i:i:i:i:~ iiiiiiiiiiiiiii!ii o!{{{ii{iiiiiiiiin{{!;!{, i Treatmet is a complete rest ad a special diet. ( v det adj cjdet adj ) I Iput bottom-~------~ ~ top e~ ~ last v det v det adj v det adj v det NP v NP v NP cj v NP cj det NP cj det adj cj det adj NP cj det NP v NP cj NP v NP CNP v NP v VP S v det adj v det adj cj det adj cj det adj cj det adj cj det adj cj det adj cj det adj cj det adj det adj adj Operatio shift shift shift shift shift reduce to NP reduce to NP shift shift shift shift reduce to NP reduce to NP reduce to CNP reduce to NP reduce to VP reduce to S doe Widowed Cotext Figure 2 A example of widowed cotext. of the cotext ivisible to the system. The ext operatio is solely decided by the widowed cotext. It ca be observed that the last state i the aalysis is the sigle symbol SNT--the desigated root symbol, o the stack alog with a empty iput strig, successfully completig the parse. Ad this is the CDG form of rule used i the phrase structure aalysis. 2.2 Algorithm for the Shift/Reduce Parser The parser accepts a strig of sytactic word classes as its iput ad forms a tesymbol vector, five symbols each from the stack ad the iput strig. It looks up this vector as the left half of a productio i the grammar ad iterprets the right half of the productio as a istructio to modify the stack ad iput sequeces to costruct the ext state of the parse. To accomplish these tasks, it maitais two stacks, oe for the iput strig ad oe for the sytactic costituets. These stacks may be arbitrarily large. A algorithm for the parser is described i Figure 3. The most importat part of this algorithm is to fid a applicable CDG rule from the grammar. Fidig such a rule is based o the curret widowed cotext. If there is a rule whose left side exactly matches the curret widowed cotext, that rule will be applied. However, realistically, it is ofte the case that there is o exact match with ay rule. Therefore, it is ecessary to fid a rule that best matches the curret cotext. 396

7 Robert E Simmos ad Yeog-Ho Yu Cotext-Depedet Grammars for Eglish CD-SR-Parser(Iput,Cdg) Iput is a strig of sytactic classes for the give setece. Cdg is the give CDG grammar rules. Stack := empty do util(iput = empty ad Stack = (SNT)) Widowed-cotext := Apped(Top-five(stack),First-ilve(iput)) Operatio := Cosult_CDG(Widow-cotext,Cdg) if First(Operatio) = SHIFT the Stack := Push(First(Iput),Stack) Iput := Rest(Iput) else Stack := Push(Secod(Operatio),Pop(Pop(Sta~k))) ed do The fuctios, Top_five ad First-five, retur the lists of top (or first) five elemets of the Stack ad the Iput respectively. If there are ot eough elemets, these procedures pad with blaks. The fuctio Apped cocateates two lists ito oe. Cosult_CDG cosults the give CDG rules to fid the ext operatio to take. The details of this fuctio are the subject of the ext sectio. Push ad Pop add or delete oe elemet to/from a stack while First ad Secod retur the first or secod elemets of a list, respectively. Rest returs the give list mius the first elemet. Figure 3 Cotext-sesitive shift reduce parser. 2.3 Cosultig the CDG Rules There are two related issues i cosultig the CDG rules. Oe is the computatioal represetatio of CDG rules, ad the other is the method for selectig a applicable rule. I the traditioal CFG paradigms, a CFG rule is applicable if the left-had side of the rule exactly matches the top elemets of the stack. However, i our CDG paradigm, a perfect match betwee the left side of a CDG rule ad the curret state caot be assured, ad i most cases, a partial match must suffice for the rule to be applied. Sice may rules may partially match the curret cotext, the best matchig rule should be selected. Oe way to do this is to use a eural etwork. Through the back-propagatio algorithm (Rumelhart, Hito, ad Williams 1986), a feed-forward etwork ca be traied to memorize the CDG rules. After successful traiig, the etwork ca be used to retrieve the best matchig rule. However, this approach based o ~ eural etwork usually takes cosiderable traiig time. For istace, i our previous experimet (Simmos ad Yu 1990), traiig a etwork for about 2,000 CDG rules took several days of computatio. Therefore, this approach has a itrisic problem for scalig up, at least o the preset geeratio of eural et simulatio software. Aother method is based o a hash table i which every CDG rule is stored accordig to its top two elemets of the stack--the fourth ad fifth elemets of the left half of the rule. Give the curret widowed cotext, the top two elemets of the stack are used to retrieve all the relevat rules from the hash table. 397

8 Computatioal Liguistics Volume 18, Number 4 We use o more tha 64 word ad phrase class symbols, so there ca be o more tha 4,096 possible pairs. The effect is to divide the large umber of rules ito o more tha 4,096 subgroups, each of which will have a maageable subset. I fact, with 16,275 rules we discovered that we have oly 823 pairs ad the average umber of rules per subgroup is 19.8; however, for frequetly occurrig pairs the umber of rules i the subgroups ca be much larger. The problem is to determie what scorig formula should be used to fid the rule that best matches a parsig cotext. Sejowski ad Roseberg (1988) aalyzed the weight matrix that resulted from traiig NETtalk ad discovered a triagular fuctio with the apex cetered at the character i the widow ad the weights fallig off i proportio to distace from that character. We decided that the best matchig rule i our system would follow a similar patter with maximum weights for the top two elemets o the stack with weights decreasig i both directios with distace from those positios. The scorig fuctio we use is developed as follows: Let T4 be the set of vectors {RI~R2,...,R} where Ri is the vector [rl, r2,..., rl0] Let C be the vector [Cl, Ca,..., c10] Let #(ci, ri) be a matchig fuctio whose value is 1 if ci = ri, ad 0 otherwise. TZ is the etire set of rules, Ri is (the left half of) a particular rule, ad C is the parse cotext. The/-4' is the subset of T4 where if Ri E T~ I the #(ri4,c4) #(ris~cs) = 1. Access of the hash table with the top two elemets of the stack, c4, c5 produces the set T4'. We ca ow defie the scorig fuctio for each Ri C T~ I. 3 i0 Score = ~_, t~(ci, ri). i+ ~_, #(ci, ri)(11 -i) i=1 i=6 The first summatio scores the matches betwee the stack elemets of the rule ad the curret cotext, ad the secod summatio scores the matches betwee the elemets i the iput strig. If two items of the rule ad cotext match, the total score is icreased by the weight assiged to that positio. The maximum score for a perfect match is 21 accordig to the above formula. From several experimets, varyig the legth of vector ad the weights, particularly those assiged to blaks, it has bee determied that this formula gave the best performace amog those tested. More importatly, it has worked well i the curret phrase structure ad case aalysis experimets. It was a uexpected surprise to us 3 that usig cotext-sesitive productios, a elemetary, determiistic, parsig algorithm proved adequate to provide 99% correct, uambiguous aaalyses for the etire text studied. 3. Grammar Acquisitio for CDG Costructig a augmeted phrase structure grammar of whatever type uificatio, GPSG, or ATN--is a paiful process usually ivolvig a well-traied liguistic team of several people. These types of grammar require that a CFG recogitio rule such 3 But perhaps ot to Marcus (1980) ad Berwick (1985), who promote the study of determiistic parsig. 398

9 Robert F. Simmos ad Yeog-Ho Yu Cotext-Depedet Grammars for Eglish as p vp ~ st be supported by such additioal iformatio as the fact that the p ad vp agree i umber, that the p is characterized by particular features such as cout, aimate, etc., ad that the vp ca or caot accept certai types of complemets. The additioal features make the rules exceedigly complex ad difficult to prepare ad debug. College studets ca be taught easily to make a phrase structure tree to represet a setece, but it requires cosiderable liguistic traiig to deal successfully with a feature grammar. We have see i the precedig sectio that a CFG is derived from recordig the successive states of the parses of seteces. Thus it was atural for us to develop a iteractive acquisitio system that would assist a liguist (or a studet) i costructig such parses to produce easily large sets of example CFG rules. 4 The system cotiued to evolve as a cosequece of our use util we had icluded capabilities to: read i text ad data files compile dictioary ad grammar tables from completed text files select a setece to cotiue processig or revise look up words i a dictioary to suggest the sytactic class for the word i cotext whe assigig sytactic classes to the words i a setece compare each state of the parse with rules i the curret grammar to predict the shift/reduce operatio. A carriage retur sigals that the user accepts the prompt, or the typig i of the desired operatio overrides it. compute ad display the parse tree from the local grammar after completio of each setece, or from the global total grammar at ay time provide backig up ad editig capability to correct errors prit help messages ad guide the user compile dictioary ad grammar etries at the completio of each setece, isurig o duplicate etries save completed or partially completed grammar files. The resultig tool, GRAMAQ, eables a liguist to costruct a cotext-sesitive grammar for a text corpus at the rate of several seteces per hour. Thousads of rules are accumulated with oly weeks of effort i cotrast to the years required for a comparable system of augmeted CFG rules. About te weeks of effort were required to produce the 16,275 rules o which this study is based. Sice GRAMAQ's prompts become more accurate as the dictioary ad grammar grow i size, there is a positive acceleratio i the speed of grammar accumulatio ad the liguist's task gradually coverges to oe of alert supervisio of the system's prompts. A slightly differet versio of GRAMAQ is Caseaq, which uses operatios that create case costituets to accumulate a cotext-sesitive grammar that trasforms 4 Startig with a Emacs editor, it was fairly easy to read i a file of seteces ad to assig each word its sytactic class accordig to its cotext. The the asterisk was iserted at the begiig of the sytactic strig, the strig was copied to the ext lie, the asterisk moved if a shift operatio was idicated, or the top two symbols o the stack were rewritte if a reduce was required--just as we costructed the example i the precedig sectio. Naturally eough, we soo made Emacs macros to help us, ad the escalated to a Lisp program that would prit the stack-*-strig ad iterpret our shift/reduce commads to produce a ew state of the parse. 399

10 Computatioal Liguistics Volume 18, Number 4 Text States Seteces Wds/St M-Wds/St Hepatitis Measles News Story APWire-Robots APWire-Rocket APWire-Shuttle Total Table 1 Characteristics of a sample of the text corpus. seteces directly to case structures with o itermediate stage of phrase structure trees. It has the same fuctioality as GRAMAQ but allows the liguist user to specify a case argumet ad value as the trasformatio of sytactic elemets o the stack, ad to reame the head of such a costituet by a sytactic label. Figure 9 i Sectio 7.3 illustrates the acquisitio of case grammar. 4. Experimets with CDG There are a umber of critical questios that eed be aswered if the claim that CDG grammars are useful is to be supported. Ca they be used to obtai accurate parses for real texts? Do they reduce ambiguity i the parsig process? How well do the rules geeralize to ew texts? How large must a CFG be to ecompass the sytactic structures for most ewspaper text? 4.1 Parsig ad Ambiguity with CDG Over the course of this study we accumulated 345 seteces maily from ewswire texts. The first two articles were brief disease descriptios from a youth ecyclopedia; the remaiig fiftee were ewspaper articles from February 1989 usig the terms "star wars," "SDI," or "Strategic Defese Iitiative." Table 1 characterizes typical articles by the umber of CDG rules or states, umber of seteces, the rage of setece legths, ad the average umber of words per setece. We developed our approach to acquirig ad parsig cotext-sesitive grammars o the first two simple texts, ad the used GRAMAQ to redo those texts ad to costruct productios for the ews stories. The total text umbered 345 seteces, which accumulated 16,275 cotext-sesitive rules--a average of 47 per setece. The parser embodyig the algorithm illustrated earlier i Figure I was augmeted to compare the costituets it costructed with those prescribed durig grammar acquisitio by the liguist. I parsig the 345 seteces, 335 parses exactly matched the liguist's origial judgemet. I ie cases i which differeces occurred, the parses were judged correct, but slightly differet sequeces of parse states occurred. The teth case clearly made a attachmet error--of a itroductory adverbial phrase i the setece "Hours later, Baghdad aouced... " This was mistakely attached to "Baghdad." This evaluatio shows that the grammar was i precise agreemet with 400

11 Robert F. Simmos ad Yeog-Ho Yu Cotext-Depedet Grammars for Eglish Aother missio soo scheduled that also would have priority over the shuttle is the first firig of a tridet two itercotietal rage missile from a submerged submarie. NP ~ N P ~ art eazotb~ ~ mis~o NP adv soo V P ~ palm scheduled. ~ - xeapro ~ that v ~ have SNT VP ~ l~iority p ~ over art ~ the shuttle -- vbe VP ~ ~ ~ - ~ ~ ~ the prprt firig PP~-% a r t ~ a NP NP tridet two I o_ NP ' ' x / adj itercotietal < rage missile PP ~ p - from a ~ - " NP N-~ ~paprt submerged r~t" N submarie Figure 4 Setece parse. the liguist 97% of the time ad completed correct parses i 99.7% of the 345 seteces from which it was derived. Sice our primary iterest was i evaluatig the effectiveess of the CDG, all these evaluatios were based o usig correct sytactic classes for the words i the seteces. The cotext-sesitive dictioary lookup procedure described i Sectio 7.3 is 99.5% accurate, but it assigs 40 word classes icorrectly. As a cosequece, usig this procedure would result i a reductio of about 10% accuracy i parsig. A output of a setece from the parser is displayed as a tree i Figure 4. Sice the whole mechaism is coded i Lisp, the actual output of the system is a ested list that is the prited as a tree. Notice i this figure that the PP at the bottom modifies the NP composed of "the first firig of a tridet two itercotietal rage missile" ot just the word "firig." Sice the parsig is bottom-up, left-to-right, the costituets are formed i the atural order of words ecoutered i the setece ad the termials of the tree ca be read top-to-bottom to give their orderig i the setece. Although 345 seteces totalig 8594 words is a small selectio from the ifiite set of possible Eglish seteces, it is large eough to assure us that the CDG is a reasoable form of grammar. Sice the determiistic parsig algorithm selects a sigle iterpretatio, which we have see almost perfectly agrees with the liguist's parsigs, it is apparet that, at least for this size text sample, there is little difficulty with ambiguous iterpretatios. 401

12 Computatioal Liguistics Volume 18, Number 4 5. Geeralizatio of CDG The purpose of accumulatig sample rules from texts is to achieve a grammar geeral eough to aalyze ew texts it has ever see. To be useful, the grammar must geeralize. There are at least three aspects of geeralizatio to be cosidered. How well does the grammar geeralize at the setece level? That is, how well does the grammar parse ew seteces that it has ot previously experieced? How well does the grammar geeralize at the operatio level? That is, how well does the grammar predict the correct Shift/Reduce operatio durig acquisitio of ew seteces? How much does the rule retetio strategy affect geeralizatio? For istace, whe the grammar predicts the same output as a ew rule does, ad the ew rule is ot saved, how well does the resultig grammar parse? 5.1 Geeralizatio at the Setece Level The complete parse of a setece is a sequece of states recogized by the grammar (whether it be CDG or ay other). If all the costituets of the ew setece ca be recogized, the ew setece ca be parsed correctly. It will be see i a later paragraph that with 16,275 rules, the grammar predicts the output of ew rules correctly about 85% of the time. For the average setece with 47 states, oly 85% or about 40 states ca be expected to be predicted correctly; cosequetly the determiistic parse will frequetly fail. I fact, 5 of 14 ew seteces parsed correctly i a brief experimet that used a grammar based o 320 seteces to attempt to parse the ew, 20-setece text. Cosiderig that oly a sigle path was followed by the determiistic parser, we predicted that a multiple-path parser would perform somewhat better for this aspect of geeralizatio. I fact, our iitial experimets with a beam search parser resulted i successful parses of 15 of the 20 ew seteces usig the same grammar based o the 320 seteces. 5.2 Geeralizatio at the Operatio Level This level of geeralizatio is of cetral sigificace to the grammar acquisitio system. Whe GRAMAQ looks up a state i the grammar it fids the best matchig state with the same top two elemets o the stack, ad offers the right half of this rule as its suggestio to the liguist. How ofte is this predictio correct? To aswer this questio we compiled the grammar of 16,275 rules i cumulative icremets of 1,017 rules usig a procedure, uio-grammar, that would oly add a rule to the grammar if the grammar did ot already predict its operatio. We call the result a "miimal-grammar," ad it cotais 3,843 rules. The black lie of Figure 5 shows that with the first 1,000 rules 40% were ew; with a accumulatio of 5,000, 18% were ew rules. By the time 16,000 rules have bee accumulated, the curve has flatteed to a average of 16% ew rules added. This meas that the acquisitio system will make correct prompts about 84% of the time ad the liguist will oly eed to correct the system's suggestios about 3 or 4 times i 20 cotext presetatios. 402

13 Robert E Simmos ad Yeog-Ho Yu Cotext-Depedet Grammars for Eglish r~ -] ~"... t... t... t... t... t...!...!... +.'"'"t... t...!... t...!...!...! i..~. I I, I I I I I l : I I l : I I -', I ; l ; I ; ;! ' I " ; ", ' I I I I i I I I I I, l,,, I ' ' '... i... ;... ~... i... i... ~ i ~-... i... ;... i... i... i i... i... i... J... L... i... i... i... J i *... G.,,--.~... i... J...!...!... i... i !...!...!... i... i i i i I i I -: Accumulated Rules by Thousads Figure 5 Geeralizatio of CDG rules. 5.3 Rule Retetio ad Geeralizatio If two parsig grammars accout equally well for the same seteces, the oe with fewer rules is less redudat, more abstract, ad the oe to be preferred. We used the uio-grammar procedure to produce ad study the miimal grammar for the 16,275 rules (rule-examples) derived from the sample text. Uio-grammar records a ew rule for a rule-example: s 1. if best matchig rule has a operatio that does't match 2. if best matchig rule ties with aother rule whose operatio does ot match 3. if 2 is true, ad score = 21 we have a full cotradictio ad list the rule as a error. Six cotradictios occurred i the grammar; five were icosistet treatmets of "SNT" followed by oe or more puctuatio marks, while the sixth offered both a shift ad a "pp" for a prepositio-ou followed by a prepositio. The latter case is a attachmet ambiguity ot resolvable by sytax. I the first pass as show i Table 2, the text resulted i 3,194 rules compared with 16,275 possible rules. That is, 13,081 possible CDG rules were ot retaied because already existig rules would match ad predict the operatio. However, usig those rules to parse the same text gave very poor results: zero correct parses at the setece level. Therefore, the process of compilig a miimal grammar was repeated startig with those 3,194 rules. This time oly 619 ew rules were added. The purpose of this 5 These defiite coditios are due to a aalysis by Mark Rig. 403

14 Computatioal Liguistics Volume 18, Number 4 Table 2 Four passes with miimal grammar. Pass UtWied Retaied Total Rules repetitio is to get rid of the effect that the rules added later chage the predictios made earlier. Fially, i a fourth repetitio of the process o rules were ew. The resultig grammar of 3,843 rules succeeds i parsig the text with oly occasioal mior errors i attachig costituets. It is to be emphasized that the uretaied rules are similar but ot idetical to those i the miimal grammar. We ca observe that this techique of miimal retetio by "uioig" ew rules to the grammar results i a compressio of the order 16,275/3,843 or 4.2 to 1, without icrease i error. If this ratio holds for larger grammars, the if the liguist accumulates 40,000 traiig-example rules to accout for the sytax of a give subset of laguage, that grammar ca be compressed automatically to about 10,000 rules that will accomplish the same task. 6. Predictig the Size of CDGs Whe ay kid of acquisitio system is used to accumulate kowledge, oe very iterestig questio is, whe will the kowledge be complete eough for the iteded applicatio? I our case, how may CDG rules will be sufficiet to cover almost all ewswire stories? To aswer this questio, a extrapolatio ca be used to fid a poit whe the solid lie of Figure 5 itersects with the y-axis. However, the CDG curve is descedig too slowly to make a reliable extrapolatio. Therefore, aother questio was ivestigated istead: whe will the CDG rules iclude a complete set of CFG rules? Note that a CDG rule is equivalet to a CFG rule if the cotext is limited to the top two elemets of the stack. What the other elemets i the cotext accomplish is to make oe rule preferable to aother that has the same top two elemets of the stack, but a differet cotext. We allow 64 symbols i our phrase structure aalysis. That meas, there are 642 possible combiatios for the top two elemets of the stack. For each combiatio, there are 65 possible operatios: 6 a shift or a reductio to aother symbol. Amog 16,275 CDG rules, we studied how may differet CFG rules ca be derived by elimiatig the cotext. We foud 844 differet CFG rules that used 600 differet left-side pairs of symbols. This shows that a give cotext free pair of symbols averages 1.4 differet operatios. 7 The, as we did with CDG rules, we measured how may ew CFG rules were added i a accumulative fashio. The shaded lie of Figure 5 shows the result. 6 Actually, there are fewer tha 65 possible operatios sice the stack elemets ca be reduced oly to otermial symbols. 7 We actually use oly 48 differet symbols, so oly 482 or 2,304 combiatios could have occurred. The fractio 600/2,304 yields.26, the proportio of the combiatoric space that is actually used, so far. 404

15 Robert E Simmos ad Yeog-Ho Yu Cotext-Depedet Grammars for Eglish ~.... :!!!! I or, _m. Z l ,000 10,000 25, ,000 Nbr of Accumulated Rules Extrapolatio thegray lie, predicts that 99% of the cotext free pairs will be achieved with the accumulatio of 25,000 cotext sesitive rues. Figure 6 Log-log plot of ew CFG rules. Notice that the lie has desceded to about 1.5% errors at 16,000 rules. To make a extrapolatio easier, a log-log graph shows the same data i Figure 6. From this graph, it ca be predicted that, after about 25,000 CDG rules are accumulated, the grammar will ecompass a CFG compoet that is 99% complete. Beyod this poit, additioal CDG rules will add almost o ew CFG rules, but oly fie-tue the grammar so that it ca resolve ambiguities more effectively. Also, it is our belief that, after the CDG reaches that poit, a multi-path, beamsearch parser will be able to parse most ewswire stories very reliably. This belief is based o our iitial experimet that used a beam search parser to test geeralizatio of the grammar to fid parses for fiftee out of twety ew seteces. 7. Acquirig Case Grammar Explicatig the phrase structure costituets of seteces is a essetial aspect i computer recogitio of meaig. Case aalysis orgaizes the costituets ito a hierarchical structure of labeled propositios. The propositios ca be used directly to aswer questios ad are the basis of schemas, scripts, ad frames that are used to add meaig to otherwise iexplicit texts. As a result of the experimets with acquirig CDG ad explorig its properties for parsig phrase structures, we became fairly cofidet that we could geeralize the system to acquisitio ad parsig based o a grammar that would compute sytactic case structures directly from sytactic strigs. Direct traslatio from strig to structure is supported by eural etwork experimets such as those by McClellad ad Kawamoto (1986), Miikkulaie ad Dyer (1989), Yu ad Simmos (1990), ad Leow ad Simmos (1990). We reasoed that if we could acquire case grammar with somethig approachig the simplicity of acquirig phrase structure rules, the result could be of great value for NL applicatios. 405

16 Computatioal Liguistics Volume 18, Number Case Structure Cook (1989) reviewed twety years of liguistic research o case aalysis of atural laguage seteces. He sythesized the various theories ito a system that depeds o the subclassificatio of verbs ito twelve categories, ad it is apparet from his review that with a fie subcategorizatio of verbs ad omials, case aalysis ca be accomplished as a purely sytactic operatio--subject to the limitatios of attachmet ambiguities that are ot resolvable by sytax. This coclusio is somewhat at variace with those AI approaches that require a sytactic aalysis to be followed by a sematic operatio that filters ad trasforms sytactic costituets to compute case-labeled propositios (e.g. Rim 1990), but it is cosistet with the eural etwork experiece of directly mappig from setece to case structure, ad with the AI research that seeks to itegrate sytactic ad sematic processig while traslatig seteces to propositioal structures. Liguistic theories of case structure have bee cocered oly with sigle propositios headed by verb predicatios; they have bee largely silet with regard to the structure of ou phrases ad the relatios amog embedded ad sequetial propositios. Additioal covetios for maagig these complicatios have bee developed i Simmos (1984) ad Alterma (1985) ad are used here. The cetral otio of a case aalysis is to traslate setece strigs ito a ested structure of case relatios (or predicates) where each relatio has a head term ad a idefiite umber of labeled argumets. A argumet may itself be a case relatio. Thus a setece, as i the examples below, forms a tree of case relatios. The old ma from Spai ate fish. (eat Agt (ma Mod old From spai) Obj fish) (is Objl Obj2 Aother missio scheduledsooisthefirstfirigofatridet missile from a submerged submarie. (missio Mod aother Obj* (scheduled Vmod soo)) (firig Mod first Det the Of (missile Nmod tridet Det a) From (submarie Mod submerged Det a))) Note that missio is i Obj* relatio to scheduled. This meas the object of scheduled is missio, ad the expressio ca be read as "aother missio such that missio is scheduled soo." A asterisk as a suffix to a label always sigals the reverse directio for the label. There is a small set of case relatios for verb argumets, such as verbmodifier, aget, object, beeficiary, experiecer, locatio, state, time, directio, etc. For ous there are determier, modifier, quatifier, amout, oumodifier, prepositio, ad reverse verb relatios, agt*, obj*, be*, etc. Prepositios ad cojuctios are usually used directly as argumet labels while setece cojuctios such as because, while, before, after, etc. are represeted as heads of propositios that relate two other propositios with the labels precedig, post, atecedet, ad cosequet. For example, "Because she ate fish ad chips earlier, Mary was ot hugry." (because Ate (ate Agt she Obj (fish Ad chips) Vmod earlier) Cose (was Vmod ot Objl mary State hugry)) Verbs are subcategorized as vao, vabo, vo, va, vhav, vbe where a is aget, o is object, b is beeficiary ad vhav is a form of have ad vbe a form of be. So far, oly the 406

17 Robert E Simmos ad Yeog-Ho Yu Cotext-Depedet Grammars for Eglish subcategory of time has bee ecessary i subcategorizig ous to accomplish this form of case aalysis, but i geeral, a lexical sematics is required to resolve sytactic attachmet ambiguities. The complete set of case relatios is presumed to be small, but o oe has yet claimed a complete eumeratio of them. Other case systems such as those taught by Schak (1980) ad Jackedoff (1983) classify predicate ames ito such primitives as Do, Evet, Thig, Mtras, Ptras, Go, Actio, etc., to approximate some form of "laguage of thought" but the preset approach is less ambitious, proposig merely to represet i a fairly formal fashio the orgaizatio of the words i a setece. Subsequet operatios o this admittedly superficial class of case structures, whe augmeted with a system of shallow lexical sematics, have bee show to accomplish questio aswerig, focus trackig of topics throughout a text, automatic outliig, ad summarizatio of texts (Seo 1990; Rim 1990). Oe strog costrait o this type of aalysis is that the resultig case structure must maitai all iformatio preset i the text so that the text may be exactly recostituted from the aalysis. 7.2 Sytactic Aalysis of Case Structure We've see earlier that a shift/reduce-reame operatio is sufficiet to parse most seteces ito phrase structures. Case structure, however, requires trasformatios i additio to these operatios. To form a case structure it is frequetly ecessary to chage the order of costituets ad to isert case labels. Followig Jackedoff's priciple of grammatical costrait, which argues essetially that sematic iterpretatio is frequetly reflected i the sytactic form, case trasformatios are accomplished as each sytactic costituet is discovered. Thus whe a verb, say throw ad a NP, say cocouts are o top of the stack, oe must ot oly create a VP, but also decide the case, Obj, ad form the costituet, (throw Obj cocouts). This ca be accomplished i customary approaches to parsig by usig augmeted cotext free recogitio rules of the form: VP~VPNP/ lobj2 where the umbers followig the slash refer to the text domiated by the sytactic class i the refereced positio, (ordered left-to-right) i the right half of the rule. The resultig costituets ca be accumulated to form the case aalysis of a setece (Simmos 1984). We develop augmeted cotext-sesitive rules followig the same priciple. Let us look agai at the example "The old ma from Spai ate fish," this time to develop case relatios. * art adj from vao ; shift art * adj from vao ; shift art adj * from vao ; shift art adj * from vao ; i mod art * from vao ; 1 det * from vao ; shift from * vao ; shift from * vao ; * vao ; shift vao * ; 2 agt vao * ; shift vao * ; 1 obj 2 (ma Mod old) 2 (ma Mod old Det the) (ma Mod old Det the From spai) 1 (ate Agt (ma Mod old... ) 2 (ate Agt (ma...) Obj fish) 407

18 Computatioal Liguistics Volume 18, Number 4 Stack V.. vpasv adj l vbe vabo prep prep by st because ad st after 2 vao vo v vpasv because st after st Case-Trasform mod adj 2 mod l 1 agt 2 1 obj 2 1 vbe 2 vpasv 2 be 1 vao 1 obj prep 2 1 cose 2 2 ate pre 2 2 post 1 Table 3 Some typical case trasformatios for sytactic costituets I this example the case trasformatio immediately follows the semicolo, ad the result of the trasformatio is show i paretheses further to the right. The result i the fial costituet is: (ate Agt (ma Mod old Det the From spai) Obj fish). Note that we did ot reame the sytactic costituets as NP or VP i this example, because we were ot iterested i showig the phrase structure tree. Reamig i case aalysis eed oly be doe whe it is ecessary to pass o iformatio accumulated from a earlier costituet. For example, i "fish were eate by birds," the CS parse is as follows: * vbe ppart by ; shift * vbe ppart by ; shift vbe * ppart by ; shift vbe ppart * by ; I vbe 2, vpasv (eate Vbe were) vpasv * by ; I obj 2 (eate Vbe were Obj fish) vpasv * by ; shift vpasv by * ; shift vpasv by * ; i prep 2 (birds Prep by) vpasv * ; 2 agt 1 (eate Vbe were Obj fish Agt (birds Prep by)) Here, it was ecessary to reame the combiatio of a past participle ad its auxiliary as a passive verb, vpasv, so that the sytactic subject ad object could be recogized as Obj ad Aget, respectively. We also chose to use the argumet ame Prep to form (birds Prep by) so that we could the call that costituet Aget. We ca see that the reduce operatio has become a reduce-trasform-reame operatio where umbers refer to elemets of the stack, the secod term provides a case argumet label, the orderig provides a trasformatio, ad a optioal fourth elemet may reame the costituet. A sample of typical case trasformatios is show associated with the top elemets of the stack i Table 3. I this table, the first elemet of the stack is i the third positio i the left side of the table, ad the umber I refers to that positio, 2 to the secod, ad 3 to the first. As a aid to the reader the first two 408

19 Robert E Simmos ad Yeog-Ho Yu Cotext-Depedet Grammars for Eglish CS-CASE-Parser(iput,cdg) Iput is a strig of sytactic classes for the give setece. Cdg is the give CDG grammar rules. stack := empty outputstack := empty do util(iput = empty ad 2d(stack) = blak) widow-cotext := apped(top-.five(stack),first_five(iput)) operatio := cosult_cdg(widow-cotext,cdg) if first(operatio) = SHIFT the stack := push(first(iput),stack) iput := rest(iput) else stack := push(select(operatio),pop(pop(stack))) outputstack := make_costituet(operatio,outputstack) ed do Figure 7 Algorithm for case parse. etries i the table refer literally by symbol rather tha by referece to the stack. The symbols vao ad vabo are subclasses of verbs that take, respectively, aget ad object; ad aget, beeficiary, ad object. The symbol v.. refers to ay verb. Forms of the verb be are referred to as vbe, ad passivizatio is marked by relabelig a verb by addig the suffix -pasv. Parsig case structures From the discussio above we may observe that the flow of cotrol i accomplishig a case parse is idetical to that of a phrase structure parse. The differece lies i the fact that whe a costituet is recogized (see Figure 7): i phrase structure, a ew ame is substituted for its stack elemets, ad a costituet is formed by listig the ame ad its elemets i case aalysis, a case trasformatio is applied to desigated elemets o the stack to costruct a costituet, ad the head (i.e. the first elemet of the trasformatio) is substituted for its elemets--uless a ew ame is provided for that substitutio. Cosequetly the algorithm used i phrase structure aalysis is easily adapted to case aalysis. The differece lies i iterpretig ad applyig the operatio to make a ew costituet ad a ew stack. I the algorithm show above, we revise the stack by attachig either the head of the ew costituet, or its ew ame, to the stack resultig from the removal of all elemets i the ew costituet. The fuctio select chooses either a ew ame if preset, or the first elemet, the head of the operatio. Makecostituet applies the trasformatio rule to form a ew costituet from the output stack ad pushes the costituet oto the output stack, which is first reduced by removig the elemets used i the costituet. Agai, the algorithm is a determiistic, first (best) path parser 409

20 Computatioal Liguistics Volume 18, Number 4 with behavior essetially the same as the phrase structure parser. But this versio accomplishes trasformatios to costruct a case structure aalysis. 7.3 Acquisitio System for Case Grammar The acquisitio system, like the parser, required oly mior revisios to accept case grammar. It must apply a shift or ay trasformatio to costruct the ew stack-strig for the liguist user, ad it must record the shift or trasformatio as the right half of a cotext-sesitive rule--still composed of a te-symbol left half ad a operatio as the right half. Cosequetly, the system will be illustrated i Figure 9 rather tha described i detail. Earlier we metioed the cotext-sesitive dictioary. This is compiled by associatig with each word the liguist's i-cotext assigmets of each sytactic word class i which it is experieced. Whe the dictioary is built, the occurrece frequecies of each word class are accumulated for each word. A primitive grammar of four-tuples termiatig with each word class is also formed ad hashed i a table of sytactic paths. The procedure to determie a word class i cotext,,, first obtais the cadidates from the dictioary.,, For each cadidate wc, it forms a four-tuple, vec, by addig it to the cdr of each immediately precedig vec, stored i IPC. Each such vec is tested agaist the table of sytactic paths; if it has bee see previously, it is added to the list of IPCs, otherwise it is elimiated. If the uio of first elemets of the IPC list is a sigle word class, that is the choice. If ot, the word's most frequet word class amog the uio of survivig classes for the word is chose. The effect of this procedure is to examie a cotext of plus ad mius three words to determie the word class i questio. Although a larger cotext based o fivetuple paths is slightly more effective, there is a tradeoff betwee accuracy ad storage requiremets. The word class selectio procedure was tested o the 8,310 words of the 345- setece sample of text. A score of 99.52% correct was achieved, with 8,270 words correctly assiged. As a compariso, the most frequet category for a word resulted i 8,137 correct assigmets for a score of 97.52%. Although there are oly 3,298 word types with a average of 3.7 tokes per type, the occurrece of sigle word class usages for words i this sample is very high, thus accoutig for the effectiveess of the simpler heuristic of assigmet of the most frequet category. However, sice the effect of misassigmet of word class ca ofte rui the parse, the use of the more complex procedure is amply justified. Aalysis of the 40 errors i word class assigmet showed 7 cofusios of ous ad verbs that will certaily cause errors i parsig; other cofusios of adjective/ou, ad adverb/prepositio are less devastatig, but still serious eough to require further improvemets i the procedure. The word class selectio procedure is adequate to form the prompts i the lexical acquisitio phase, but the statistics o parsig effectiveess give earlier deped o perfect word class assigmets. Show i Figure 8 is the system's presetatio of a setece ad its requests for each word's sytactic class. The protocol i Figure 9 shows the acquisitio of shift 410

Natural language processing implementation on Romanian ChatBot

Natural language processing implementation on Romanian ChatBot Proceedigs of the 9th WSEAS Iteratioal Coferece o SIMULATION, MODELLING AND OPTIMIZATION Natural laguage processig implemetatio o Romaia ChatBot RALF FABIAN, MARCU ALEXANDRU-NICOLAE Departmet for Iformatics

More information

'Norwegian University of Science and Technology, Department of Computer and Information Science

'Norwegian University of Science and Technology, Department of Computer and Information Science The helpful Patiet Record System: Problem Orieted Ad Kowledge Based Elisabeth Bayega, MS' ad Samso Tu, MS2 'Norwegia Uiversity of Sciece ad Techology, Departmet of Computer ad Iformatio Sciece ad Departmet

More information

E-LEARNING USABILITY: A LEARNER-ADAPTED APPROACH BASED ON THE EVALUATION OF LEANER S PREFERENCES. Valentina Terzieva, Yuri Pavlov, Rumen Andreev

E-LEARNING USABILITY: A LEARNER-ADAPTED APPROACH BASED ON THE EVALUATION OF LEANER S PREFERENCES. Valentina Terzieva, Yuri Pavlov, Rumen Andreev Titre du documet / Documet title E-learig usability : A learer-adapted approach based o the evaluatio of leaer's prefereces Auteur(s) / Author(s) TERZIEVA Valetia ; PAVLOV Yuri (1) ; ANDREEV Rume (2) ;

More information

Fuzzy Reference Gain-Scheduling Approach as Intelligent Agents: FRGS Agent

Fuzzy Reference Gain-Scheduling Approach as Intelligent Agents: FRGS Agent Fuzzy Referece Gai-Schedulig Approach as Itelliget Agets: FRGS Aget J. E. ARAUJO * eresto@lit.ipe.br K. H. KIENITZ # kieitz@ita.br S. A. SANDRI sadra@lac.ipe.br J. D. S. da SILVA demisio@lac.ipe.br * Itegratio

More information

arxiv: v1 [cs.dl] 22 Dec 2016

arxiv: v1 [cs.dl] 22 Dec 2016 ScieceWISE: Topic Modelig over Scietific Literature Networks arxiv:1612.07636v1 [cs.dl] 22 Dec 2016 A. Magalich, V. Gemmetto, D. Garlaschelli, A. Boyarsky Uiversity of Leide, The Netherlads {magalich,

More information

Consortium: North Carolina Community Colleges

Consortium: North Carolina Community Colleges Associatio of Research Libraries / Texas A&M Uiversity www.libqual.org Cotributors Collee Cook Texas A&M Uiversity Fred Heath Uiversity of Texas BruceThompso Texas A&M Uiversity Martha Kyrillidou Associatio

More information

Application for Admission

Application for Admission Applicatio for Admissio Admissio Office PO Box 2900 Illiois Wesleya Uiversity Bloomig, Illiois 61702-2900 Apply o-lie at: www.iwu.edu Applicatio Iformatio I am applyig: Early Actio Regular Decisio Early

More information

CONSTITUENT VOICE TECHNICAL NOTE 1 INTRODUCING Version 1.1, September 2014

CONSTITUENT VOICE TECHNICAL NOTE 1 INTRODUCING  Version 1.1, September 2014 preview begis oct 2014 lauches ja 2015 INTRODUCING WWW.FEEDBACKCOMMONS.ORG A serviced cloud platform to share ad compare feedback data ad collaboratively develop feedback ad learig practice CONSTITUENT

More information

Management Science Letters

Management Science Letters Maagemet Sciece Letters 4 (24) 2 26 Cotets lists available at GrowigSciece Maagemet Sciece Letters homepage: www.growigsciece.com/msl A applicatio of data evelopmet aalysis for measurig the relative efficiecy

More information

HANDBOOK. Career Center Handbook. Tools & Tips for Career Search Success CALIFORNIA STATE UNIVERSITY, SACR AMENTO

HANDBOOK. Career Center Handbook. Tools & Tips for Career Search Success CALIFORNIA STATE UNIVERSITY, SACR AMENTO HANDBOOK Career Ceter Hadbook CALIFORNIA STATE UNIVERSITY, SACR AMENTO Tools & Tips for Career Search Success Academic Advisig ad Career Ceter 6000 J Street Lasse Hall 1013 Sacrameto, CA 95819-6064 916-278-6231

More information

part2 Participatory Processes

part2 Participatory Processes part part2 Participatory Processes Participatory Learig Approaches Whose Learig? Participatory learig is based o the priciple of ope expressio where all sectios of the commuity ad exteral stakeholders

More information

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together

More information

VISION, MISSION, VALUES, AND GOALS

VISION, MISSION, VALUES, AND GOALS 6 VISION, MISSION, VALUES, AND GOALS 2010-2015 VISION STATEMENT Ohloe College will be kow throughout Califoria for our iclusiveess, iovatio, ad superior rates of studet success. MISSION STATEMENT The Missio

More information

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions. to as a linguistic theory to to a member of the family of linguistic frameworks that are called generative grammars a grammar which is formalized to a high degree and thus makes exact predictions about

More information

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR ROLAND HAUSSER Institut für Deutsche Philologie Ludwig-Maximilians Universität München München, West Germany 1. CHOICE OF A PRIMITIVE OPERATION The

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

also inside Continuing Education Alumni Authors College Events

also inside Continuing Education Alumni Authors College Events SUMMER 2016 JAMESTOWN COMMUNITY COLLEGE ALUMNI MAGAZINE create a etrepreeur creatig a busiess a artist creatig beauty a citize creatig the future also iside Cotiuig Educatio Alumi Authors College Evets

More information

CS 598 Natural Language Processing

CS 598 Natural Language Processing CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@

More information

2014 Gold Award Winner SpecialParent

2014 Gold Award Winner SpecialParent Award Wier SpecialParet Dedicated to all families of childre with special eeds 6 th Editio/Fall/Witer 2014 Desig ad Editorial Awards Competitio MISSION Our goal is to provide parets of childre with special

More information

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,

More information

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition Chapter 2: The Representation of Knowledge Expert Systems: Principles and Programming, Fourth Edition Objectives Introduce the study of logic Learn the difference between formal logic and informal logic

More information

"f TOPIC =T COMP COMP... OBJ

f TOPIC =T COMP COMP... OBJ TREATMENT OF LONG DISTANCE DEPENDENCIES IN LFG AND TAG: FUNCTIONAL UNCERTAINTY IN LFG IS A COROLLARY IN TAG" Aravind K. Joshi Dept. of Computer & Information Science University of Pennsylvania Philadelphia,

More information

This scope and sequence assumes 160 days for instruction, divided among 15 units.

This scope and sequence assumes 160 days for instruction, divided among 15 units. In previous grades, students learned strategies for multiplication and division, developed understanding of structure of the place value system, and applied understanding of fractions to addition and subtraction

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

Proof Theory for Syntacticians

Proof Theory for Syntacticians Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax

More information

Some Principles of Automated Natural Language Information Extraction

Some Principles of Automated Natural Language Information Extraction Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract

More information

On March 15, 2016, Governor Rick Snyder. Continuing Medical Education Becomes Mandatory in Michigan. in this issue... 3 Great Lakes Veterinary

On March 15, 2016, Governor Rick Snyder. Continuing Medical Education Becomes Mandatory in Michigan. in this issue... 3 Great Lakes Veterinary michiga veteriary medical associatio i this issue... 3 Great Lakes Veteriary Coferece 4 What You Need to Kow Whe Issuig a Iterstate Certificate of Ispectio 6 Low Pathogeic Avia Iflueza H5 Virus Detectios

More information

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS ELIZABETH ANNE SOMERS Spring 2011 A thesis submitted in partial

More information

Grammars & Parsing, Part 1:

Grammars & Parsing, Part 1: Grammars & Parsing, Part 1: Rules, representations, and transformations- oh my! Sentence VP The teacher Verb gave the lecture 2015-02-12 CS 562/662: Natural Language Processing Game plan for today: Review

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing. Lecture 4: OT Syntax Sources: Kager 1999, Section 8; Legendre et al. 1998; Grimshaw 1997; Barbosa et al. 1998, Introduction; Bresnan 1998; Fanselow et al. 1999; Gibson & Broihier 1998. OT is not a theory

More information

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.

More information

Construction Grammar. University of Jena.

Construction Grammar. University of Jena. Construction Grammar Holger Diessel University of Jena holger.diessel@uni-jena.de http://www.holger-diessel.de/ Words seem to have a prototype structure; but language does not only consist of words. What

More information

Chapter 4 - Fractions

Chapter 4 - Fractions . Fractions Chapter - Fractions 0 Michelle Manes, University of Hawaii Department of Mathematics These materials are intended for use with the University of Hawaii Department of Mathematics Math course

More information

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up

More information

Developing a TT-MCTAG for German with an RCG-based Parser

Developing a TT-MCTAG for German with an RCG-based Parser Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,

More information

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education GCSE Mathematics B (Linear) Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education Mark Scheme for November 2014 Oxford Cambridge and RSA Examinations OCR (Oxford Cambridge

More information

Loughton School s curriculum evening. 28 th February 2017

Loughton School s curriculum evening. 28 th February 2017 Loughton School s curriculum evening 28 th February 2017 Aims of this session Share our approach to teaching writing, reading, SPaG and maths. Share resources, ideas and strategies to support children's

More information

Statewide Framework Document for:

Statewide Framework Document for: Statewide Framework Document for: 270301 Standards may be added to this document prior to submission, but may not be removed from the framework to meet state credit equivalency requirements. Performance

More information

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English. Basic Syntax Doug Arnold doug@essex.ac.uk We review some basic grammatical ideas and terminology, and look at some common constructions in English. 1 Categories 1.1 Word level (lexical and functional)

More information

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.

More information

A Version Space Approach to Learning Context-free Grammars

A Version Space Approach to Learning Context-free Grammars Machine Learning 2: 39~74, 1987 1987 Kluwer Academic Publishers, Boston - Manufactured in The Netherlands A Version Space Approach to Learning Context-free Grammars KURT VANLEHN (VANLEHN@A.PSY.CMU.EDU)

More information

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data Kurt VanLehn 1, Kenneth R. Koedinger 2, Alida Skogsholm 2, Adaeze Nwaigwe 2, Robert G.M. Hausmann 1, Anders Weinstein

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic

More information

Characteristics of the Text Genre Realistic fi ction Text Structure

Characteristics of the Text Genre Realistic fi ction Text Structure LESSON 14 TEACHER S GUIDE by Oscar Hagen Fountas-Pinnell Level A Realistic Fiction Selection Summary A boy and his mom visit a pond and see and count a bird, fish, turtles, and frogs. Number of Words:

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

Ch VI- SENTENCE PATTERNS.

Ch VI- SENTENCE PATTERNS. Ch VI- SENTENCE PATTERNS faizrisd@gmail.com www.pakfaizal.com It is a common fact that in the making of well-formed sentences we badly need several syntactic devices used to link together words by means

More information

Physics 270: Experimental Physics

Physics 270: Experimental Physics 2017 edition Lab Manual Physics 270 3 Physics 270: Experimental Physics Lecture: Lab: Instructor: Office: Email: Tuesdays, 2 3:50 PM Thursdays, 2 4:50 PM Dr. Uttam Manna 313C Moulton Hall umanna@ilstu.edu

More information

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class If we cancel class 1/20 idea We ll spend an extra hour on 1/21 I ll give you a brief writing problem for 1/21 based on assigned readings Jot down your thoughts based on your reading so you ll be ready

More information

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,

More information

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S N S ER E P S I M TA S UN A I S I T VER RANKING AND UNRANKING LEFT SZILARD LANGUAGES Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A-1997-2 UNIVERSITY OF TAMPERE DEPARTMENT OF

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

LEGO MINDSTORMS Education EV3 Coding Activities

LEGO MINDSTORMS Education EV3 Coding Activities LEGO MINDSTORMS Education EV3 Coding Activities s t e e h s k r o W t n e d Stu LEGOeducation.com/MINDSTORMS Contents ACTIVITY 1 Performing a Three Point Turn 3-6 ACTIVITY 2 Written Instructions for a

More information

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading ELA/ELD Correlation Matrix for ELD Materials Grade 1 Reading The English Language Arts (ELA) required for the one hour of English-Language Development (ELD) Materials are listed in Appendix 9-A, Matrix

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

Natural Language Processing. George Konidaris

Natural Language Processing. George Konidaris Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2017 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans

More information

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I Session 1793 Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I John Greco, Ph.D. Department of Electrical and Computer Engineering Lafayette College Easton, PA 18042 Abstract

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

Course Content Concepts

Course Content Concepts CS 1371 SYLLABUS, Fall, 2017 Revised 8/6/17 Computing for Engineers Course Content Concepts The students will be expected to be familiar with the following concepts, either by writing code to solve problems,

More information

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Jakub Waszczuk, Agata Savary To cite this version: Jakub Waszczuk, Agata Savary. Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]. PARSEME 6th general

More information

The Smart/Empire TIPSTER IR System

The Smart/Empire TIPSTER IR System The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of

More information

Lecture 10: Reinforcement Learning

Lecture 10: Reinforcement Learning Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation

More information

Grade 4. Common Core Adoption Process. (Unpacked Standards)

Grade 4. Common Core Adoption Process. (Unpacked Standards) Grade 4 Common Core Adoption Process (Unpacked Standards) Grade 4 Reading: Literature RL.4.1 Refer to details and examples in a text when explaining what the text says explicitly and when drawing inferences

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

PowerTeacher Gradebook User Guide PowerSchool Student Information System

PowerTeacher Gradebook User Guide PowerSchool Student Information System PowerSchool Student Information System Document Properties Copyright Owner Copyright 2007 Pearson Education, Inc. or its affiliates. All rights reserved. This document is the property of Pearson Education,

More information

An Introduction to the Minimalist Program

An Introduction to the Minimalist Program An Introduction to the Minimalist Program Luke Smith University of Arizona Summer 2016 Some findings of traditional syntax Human languages vary greatly, but digging deeper, they all have distinct commonalities:

More information

Grade 6: Correlated to AGS Basic Math Skills

Grade 6: Correlated to AGS Basic Math Skills Grade 6: Correlated to AGS Basic Math Skills Grade 6: Standard 1 Number Sense Students compare and order positive and negative integers, decimals, fractions, and mixed numbers. They find multiples and

More information

Rendezvous with Comet Halley Next Generation of Science Standards

Rendezvous with Comet Halley Next Generation of Science Standards Next Generation of Science Standards 5th Grade 6 th Grade 7 th Grade 8 th Grade 5-PS1-3 Make observations and measurements to identify materials based on their properties. MS-PS1-4 Develop a model that

More information

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company Table of Contents Welcome to WiggleWorks... 3 Program Materials... 3 WiggleWorks Teacher Software... 4 Logging In...

More information

Getting Started with Deliberate Practice

Getting Started with Deliberate Practice Getting Started with Deliberate Practice Most of the implementation guides so far in Learning on Steroids have focused on conceptual skills. Things like being able to form mental images, remembering facts

More information

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque Approaches to control phenomena handout 6 5.4 Obligatory control and morphological case: Icelandic and Basque Icelandinc quirky case (displaying properties of both structural and inherent case: lexically

More information

Compositional Semantics

Compositional Semantics Compositional Semantics CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Words, bag of words Sequences Trees Meaning Representing Meaning An important goal of NLP/AI: convert natural language

More information

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2 Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2 Ted Pedersen Department of Computer Science University of Minnesota Duluth, MN, 55812 USA tpederse@d.umn.edu

More information

Evolution of Symbolisation in Chimpanzees and Neural Nets

Evolution of Symbolisation in Chimpanzees and Neural Nets Evolution of Symbolisation in Chimpanzees and Neural Nets Angelo Cangelosi Centre for Neural and Adaptive Systems University of Plymouth (UK) a.cangelosi@plymouth.ac.uk Introduction Animal communication

More information

Radius STEM Readiness TM

Radius STEM Readiness TM Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and

More information

Changing User Attitudes to Reduce Spreadsheet Risk

Changing User Attitudes to Reduce Spreadsheet Risk Changing User Attitudes to Reduce Spreadsheet Risk Dermot Balson Perth, Australia Dermot.Balson@Gmail.com ABSTRACT A business case study on how three simple guidelines: 1. make it easy to check (and maintain)

More information

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist Meeting 2 Chapter 7 (Morphology) and chapter 9 (Syntax) Today s agenda Repetition of meeting 1 Mini-lecture on morphology Seminar on chapter 7, worksheet Mini-lecture on syntax Seminar on chapter 9, worksheet

More information

Ohio s Learning Standards-Clear Learning Targets

Ohio s Learning Standards-Clear Learning Targets Ohio s Learning Standards-Clear Learning Targets Math Grade 1 Use addition and subtraction within 20 to solve word problems involving situations of 1.OA.1 adding to, taking from, putting together, taking

More information

Chapter 4: Valence & Agreement CSLI Publications

Chapter 4: Valence & Agreement CSLI Publications Chapter 4: Valence & Agreement Reminder: Where We Are Simple CFG doesn t allow us to cross-classify categories, e.g., verbs can be grouped by transitivity (deny vs. disappear) or by number (deny vs. denies).

More information

Using Blackboard.com Software to Reach Beyond the Classroom: Intermediate

Using Blackboard.com Software to Reach Beyond the Classroom: Intermediate Using Blackboard.com Software to Reach Beyond the Classroom: Intermediate NESA Conference 2007 Presenter: Barbara Dent Educational Technology Training Specialist Thomas Jefferson High School for Science

More information

What the National Curriculum requires in reading at Y5 and Y6

What the National Curriculum requires in reading at Y5 and Y6 What the National Curriculum requires in reading at Y5 and Y6 Word reading apply their growing knowledge of root words, prefixes and suffixes (morphology and etymology), as listed in Appendix 1 of the

More information

How to Judge the Quality of an Objective Classroom Test

How to Judge the Quality of an Objective Classroom Test How to Judge the Quality of an Objective Classroom Test Technical Bulletin #6 Evaluation and Examination Service The University of Iowa (319) 335-0356 HOW TO JUDGE THE QUALITY OF AN OBJECTIVE CLASSROOM

More information

South Carolina English Language Arts

South Carolina English Language Arts South Carolina English Language Arts A S O F J U N E 2 0, 2 0 1 0, T H I S S TAT E H A D A D O P T E D T H E CO M M O N CO R E S TAT E S TA N DA R D S. DOCUMENTS REVIEWED South Carolina Academic Content

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

Constraining X-Bar: Theta Theory

Constraining X-Bar: Theta Theory Constraining X-Bar: Theta Theory Carnie, 2013, chapter 8 Kofi K. Saah 1 Learning objectives Distinguish between thematic relation and theta role. Identify the thematic relations agent, theme, goal, source,

More information

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Nathaniel Hayes Department of Computer Science Simpson College 701 N. C. St. Indianola, IA, 50125 nate.hayes@my.simpson.edu

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE Submitted in partial fulfillment of the requirements for the degree of Sarjana Sastra (S.S.)

More information

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1 Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial

More information

Chapter 2 Rule Learning in a Nutshell

Chapter 2 Rule Learning in a Nutshell Chapter 2 Rule Learning in a Nutshell This chapter gives a brief overview of inductive rule learning and may therefore serve as a guide through the rest of the book. Later chapters will expand upon the

More information

Writing Research Articles

Writing Research Articles Marek J. Druzdzel with minor additions from Peter Brusilovsky University of Pittsburgh School of Information Sciences and Intelligent Systems Program marek@sis.pitt.edu http://www.pitt.edu/~druzdzel Overview

More information

Learning Disability Functional Capacity Evaluation. Dear Doctor,

Learning Disability Functional Capacity Evaluation. Dear Doctor, Dear Doctor, I have been asked to formulate a vocational opinion regarding NAME s employability in light of his/her learning disability. To assist me with this evaluation I would appreciate if you can

More information

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment Takako Aikawa, Lee Schwartz, Ronit King Mo Corston-Oliver Carmen Lozano Microsoft

More information

CONSTRUCTION OF AN ACHIEVEMENT TEST Introduction One of the important duties of a teacher is to observe the student in the classroom, laboratory and

CONSTRUCTION OF AN ACHIEVEMENT TEST Introduction One of the important duties of a teacher is to observe the student in the classroom, laboratory and CONSTRUCTION OF AN ACHIEVEMENT TEST Introduction One of the important duties of a teacher is to observe the student in the classroom, laboratory and in other settings. He may also make use of tests in

More information