QUERY TRANSLATION FOR CROSS-LANGUAGE INFORMATION RETRIEVAL BY PARSING CONSTRAINT SYNCHRONOUS GRAMMAR

Size: px
Start display at page:

Download "QUERY TRANSLATION FOR CROSS-LANGUAGE INFORMATION RETRIEVAL BY PARSING CONSTRAINT SYNCHRONOUS GRAMMAR"

Transcription

1 QUERY TRANSLATION FOR CROSS-LANGUAGE INFORMATION RETRIEVAL BY PARSING CONSTRAINT SYNCHRONOUS GRAMMAR FRANCISCO OLIVEIRA 1, FAI WONG 1, KA-SENG LEONG 1, CHIO-KIN TONG 1, MING-CHUI DONG 1 1 Faculty of Scence and Technology, Unversty of Macau, Macao {olfran, derekfw, ma56538, ma66535, mcdong}@umac.mo Abstract: Wth the avalablty of large amounts of multlngual documents, Cross-Language Informaton Retreval (CLIR) has become an actve research area n recent years. However, researchers often face wth the problem of nherent ambgutes nvolved n natural languages. Moreover, ths task s even more challengng for processng the Chnese language because word boundares are not defned n the sentence. Ths paper presents a Chnese-Portuguese query translaton for CLIR based on a Machne Translaton (MT) system that parses Constrant Synchronous Grammar (CSG). Unlke tradtonal transfer-based MT archtectures, ths model only requres a set of CSG rules for modelng syntactc structures of two languages smultaneously to perform the translaton. Moreover, CSG can be used to remove dfferent levels of dsambguaton as the parsng processes n order to generate a translaton wth qualty. Keywords: Cross-Language Informaton Retreval, Machne Translaton, Constrant Synchronous Grammar 1. Introducton The man objectve of Cross-Language Informaton Retreval (CLIR) s to retreve nformaton wrtten n a language dfferent from the language of the user's nput query. Ths s especally helpful n Macau, where many documents are wrtten n Chnese and Portuguese, snce both of them are offcal languages n the terrtory. CLIR systems permt users to retreve documents wrtten not only n ther natve language but also n the other one. However, t s not easy to obtan query translatons wth hgh qualty n any doman due to the nherent ambgutes and lngustc phenomena nvolved n natural languages, and the need of a enormous knowledge for dsambguaton. In the lterature, several approaches have been proposed. In blngual dctonary based approaches [1], query translaton s generated by lookng at the entres of the lexcon. Although t s effcent n terms of translaton, large amount of vocabulary s needed to cover all the words. However, t s not easy to acheve and the problem s even worse wth Chnese, whch do not have word boundares. Moreover, words n the dctonary usually have more than one translaton, and t s a dffcult task for selectng the best translaton by just consderng the blngual dctonary. MT based approaches seems to be the deal soluton for CLIR. It s manly because MT systems translate the sentence as a whole, and the translaton ambguty problem s solved durng the analyss of the source sentence. Rule-based MT [2] uses a method based on a set of lngustc rules, where rules are translated n a lngustc way. Snce these rules are unversal, they are doman ndependent. However, ths approach often requres a large human cost n formulatng rules and t s hard to mantan consstency as the number of rules ncreases. In statstcal based approaches [3, 4], the translaton s determned by estmatng the probabltes between the translaton of words and the orderng of the sentences based on a parallel corpora. However, these approaches suffer from the dependency wth the parallel corpora. For Example-based MT [5], t wll search for peces of examples stored n the parallel corpora for generatng the translaton, but t often depends on the qualty of the examples and the smlarty functon defned. In recent years, some versons of synchronous grammars [6] are proposed for solvng non-somorphc tree based transducton problem and to provde solutons to Machne Translaton. For example, Synchronous Tree Adjonng Grammar [7] was ntally appled for semantcs but was later consdered for translaton [8]. Multple Context-Free Grammar [9] was used by defnng a set of functons for non-termnal symbols n the productons n order to nterpret the symbols n the target generaton. However, t s hard to descrbe dscontnuous consttuents n lngustc expresson [10]. Melamed [6] modeled the problem of MT as a synchronous parsng based on Generalzed Multtext Grammars that mantan two sets of productons as components, one for each language, for modelng parallel texts. Although t can be used to descrbe

2 semantc nformaton wth detals assocated wth a non-termnal, t s dffcult for the development of a practcal MT system due to ts lack of flexblty. In ths paper, we appled Constrant Synchronous Grammar (CSG) [10], a varaton of synchronous grammar, as the kernel of the MT system n performng query translaton for Chnese-Portuguese CLIR. CSG can be used to express detaled feature structures lke gender, number, agreement, etc n each non-termnal consttuent for performng necessary dsambguaton n each level, and CSG can express non-standard lngustc phenomena, ncludng crossng dependences, and dscontnuous consttuents n the nference rules. Moreover, CSG allows the parser to remove the ambguous parse trees as the parsng progresses by makng use of varous lngustc features defned. Ths paper s organzed as follows: the desgn model of a Chnese-Portuguese query translaton for CLIR by parsng CSG s gven n secton 2. Chnese word segmentaton s presented n secton 3. The parsng of Constrant-based Synchronous Grammar and the generaton of the translaton are detaled n sectons 4 and 5. Fnally, the concluson s gven n secton 6. Tranng Corpus POS Rules Lexcon CSG Rules KB Eu tenho duas rmãs ntelgentes 我有兩個聰明的姐妹 Words Rough Segmentaton 我有兩個聰明的姐妹我有兩個聰明的姐妹 Probablstc Taggng CSG Parsng Generaton wth Morphologcal Analyss Monolngual IR System MT Kernel 我 /pro 有 /v 兩 /num 個 /q 聰明的 /a 姐妹 /n Fgure 1. Kernel desgn of the proposed MT system 2. Desgn Model of Chnese-Portuguese CLIR System The kernel of the MT system proposed n applcaton to CLIR s shown n Fgure 1. The translaton process begns wth word segmentaton of the gven Chnese query, and t s then tagged by a probablstc tagger. Based on the segmented and tagged result, the source sentence s further analyzed by usng a modfed generalzed LR parser [11] for nferrng the syntactc structure of the nput, guded by CSG rules, n order to determne the target language sentental pattern. Ths pattern s then morphologcally analyzed n order to generate the translaton of the source language. The sentence translated s then passed to a monolngual IR system for retrevng documents wrtten n target language. 3. Chnese Segmentaton and Taggng Module Whether Chnese words are segmented effectvely and correctly s vtal n obtanng a good translaton result n MT systems nvolvng Chnese translaton. Ths s manly because Chnese sentences, unlke other Western languages such as Portuguese, there are no delmters between words n the sentence. Moreover, there are many ambguty problems n correctly segmentng a Chnese sentence. In our desgn model, we appled N-Shortest-Paths method [12] for generatng a set of rough segmented results of Chnese sentences Words Rough Segmentaton Model For a gven Chnese sentence, a drected graph s constructed wth each of ts atomc characters as the vertces (V 1, V 2,, V n ). Edges between the vertces are determned by probabltes of the atomc characters or the combnatons of the words obtaned n the Chnese corpus. Let W be one of the possble results of the segmentaton for the Chnese sentence C, then the probablty of W, gven C s defned as: P( W ) P( C W ) P( W C) (1) PC ( ) Snce the probablty of the Chnese sentence P(C) s a constant, and the probablty of C, gven W must be 1, the objectve s to determne the N dfferent segmentatons whch have the N largest probabltes of P(W). Suppose that a possble segmentaton sequence W conssts of w 1, w 2,, w m words, then the probablty P(w ) can be approxmated as:

3 Pw ( ) ( k 1) m ( k V) j 0 k s the number of occurrences of w and V s the number of word types n the tranng corpus. Smoothng s appled by addng a constant n the numerator by takng nto consderaton that w may not appear n the tranng corpus. By assumng that the context wthn the sentence s not consdered for smplcty, the best word sequence W can be computed as arg max P( W ) arg max P( w ) W m ( k 1) arg max( ) j 0 j m 1 (2) (3) m 1 k j V Based on the segmented canddate Chnese sentences, these are gong to be tagged by a probablstc tagger [13] based on Hdden Markov Model [14] to determne the fnal segmented sentence and the best POS tag for each word. 4. Parsng Constrant Synchronous Grammar Constrant Synchronous Grammar [10] s based on the formalsm of Context Free Grammar (CFG) to the case of synchronous. In CSG formalsm, t conssts of a set of producton rules that descrbes the sentental patterns of the source text and target translaton patterns Defnton of CSG In CSG, every producton rule s n the form of S NP 1 PP NP 2 VP* NP 3 { [NP 1 VP a NP 3 NP 2 ] ; VP cat = vb1 & PP = 把 & VP s:sem = NP 1sem & VP o:sem = NP 2sem & VP o:sem = NP 3sem [NP 1 VP NP 2 em NP 3 ] ; } In ths producton rule, t has two generatve rules assocated wth the sentental pattern of the source NP 1 PP NP 2 VP NP 3. The determnaton of the sutable generatve rule s based on the control condtons defned by rule. The one satsfyng all the condtons determnes the relatonshp between the source and target sentental pattern. For example, f the category of VP s vb1, the preposton gven s 把, and the sense of the subject, drect, and ndrect objects governed by the verb VP corresponds to the frst, second, and the thrd nouns (NP), then the source pattern NP 1 PP NP 2 VP NP 3 s assocated wth the target pattern NP 1 VP a NP 3 NP 2. The astersk * ndcates the head element, and ts usage s to propagate all the related features/lngustc nformaton of the head symbol to the reduced non-termnal symbol n the left hand sde. The use of the * s to acheve the property of features nhertance n CSG formalsm Ther relatonshp s establshed by the gven subscrpts and the sequence s based on the target sentental pattern. As an example, n the frst generatve rule, NP 1 VP a NP 3 NP 2, although the frst NP n the source pattern corresponds to the frst NP n the target one, the verb, the second and thrd noun phrases n the source are changed n the target sentental pattern. Understandng the orderng of consttuents n the target sentental pattern s very mportant because t affects not only n the correctness of the sentence n terms of grammar but also n terms of meanng. For example, suppose that the sentence 貧窮的人 (a poor man) s gong to be translated. If word by word translaton s appled, the sentence wll be translated as pobre homem. Although the sentence s translated correctly n terms of grammar, t s not correct n terms of the meanng. Ths happens because the postonng between adjectves and nouns n Portuguese language may produce dfferent meanngs. In ths case, pobre homem means a ptful man and not a poor man. Ths problem can be easly solved by defnng a CSG producton rule that has dfferent generatve rules assocated wth the same source sentental, where each of these rules are controlled by dfferent condtons. As a result, the source sentence 貧窮的人 wll be translated as homem pobre (a poor man) nstead of pobre homem Feature Descrptors n Attrbute Value Matrx In ths model, semantc nformaton s represented by feature descrptors (FD) whch gve addtonal flexblty n defnng CSG rules for establshng agreements n syntactc and sub-categorzaton dependences. Feature descrptors related to a sngle lexcal word or a consttuent are encoded n Attrbute value matrces (AVM). Each FD s a set of pars n the type of a = v, where a s an attrbute and v s a value, ether an atomc symbol or recursvely a FD. Moreover, feature unfcaton s performed durng the

4 parsng stage. If FDs of each lexcon word or lexcal are compatble wth each other,.e. there are no conflcts on the value of all the attrbutes defned, unfcaton succeeds and a new FD s constructed. As an example, consder that a new noun phrase s gong to be reduced based on the words 探測 (probe) and 石油 (petroleum) and below, t shows ther AVMs. c-lexcal = 探測 category = NP p-lexcal = tenteamento sense = medcne p-lexcal = sondagem sense = nature (object) Fgure 2. AVMs of the words 探測 and 石油 If the control condton defned by the rule requres that the senses of the noun phrases must be equal to each other, then the unfcaton wll select the meanng of sondagem (probe) snce ths sense can be unfed wth the one of petróleo (petroleum). In tradtonal unfcaton based approaches [15], f FDs of each lexcon word or a consttuent are not compatble wth each other durng the unfcaton process, nothng s returned. However, f only one of the FDs unfcaton fals, then all the related canddate words wll be rejected wthout any flexblty n choosng the next preferable or probable canddate. Thus, n our desgn, each feature s assocated wth an ntal weght and rankng s performed durng the parsng stage for choosng the most sutable canddate word. Suppose that the AVMs of the words 死屍 (corpse) and 漂浮 (to fluctuate) are shown below: c-lexcal = 死屍 category = NP p-lexcal = desenterrado sense = human c-lexcal = 石油 category = NP p-lexcal =petróleo sense = nature (object) c-lexcal = 漂浮 category = VP p-lexcal = flutuar sense = lvng creature p-lexcal = parar sense = transportaton Fgure 3. AVMs of the words 死屍 and 漂浮 Durng the parsng stage, f the control condton requres that the sense of NP must be equal to the sense of the subject governed by VP, weghts are assgned durng the valdaton process and the one that has the hghest weght wll be selected for unfcaton. The assgnment of weghts s based on the followng polces: f unfcaton can be performed between the senses of the lexcal words or consttuents, then the weght s ncreased by 1; f unfcaton fals, but f the sense of a word s an nherted hypernym of the other or vce-versa, the weght s ncreased by 0.5. FDs wth the hghest weght are chosen as the most preferable content. In ths example, tradtonal unfcaton approach wll just return falure. Although there are no exact matches between the senses of 死屍 (corpse) and 漂浮 (to fluctuate), snce the sense human s hyponymc to the sense of lvng creature, FD of the Portuguese word flutuar (to fluctuate) wll stll be unfed wth FD of desenterrado (corpse) and selected as the most sutable canddate Expressveness of CSG As mentoned prevously, CSG can be used to descrbe non-standard lngustc phenomena. For example, consder the blngual sentence: 她 /NP1 把 /PP 兩支鋼筆 /NP 借給了 /VP 佩德羅 /NP Ela emprestou ao Pedro duas canetas (She lent two pens to Peter) It s often that many lngustc expressons wll not appear n the translaton of the other language. For nstance, the preposton PP does not appear n any of the target rules. Moreover, the Chnese preposton 把 and the verb 借給了 should be pared wth the Portuguese verb emprestar (to lend). These observatons show that CSG can be used to express not only structural devatons between two dfferent languages, but also dscontnuous consttuents relatonshps n the Chnese component CSG Parser CSG formalsm s parsed by a modfed verson of generalzed LR algorthm [11] that takes the features constrants and the nference of the target structure nto consderaton. The man reason for choosng ths algorthm s due to the consderable effcency over the Earley s parsng algorthm [16] whch requres a set of computatons of LR tems at each stage of parsng [11]. Furthermore, the parsng table used s extended by addng features constrants and the target rules nto the actons table.

5 5. Generaton of the translaton Once the parse tree s constructed, the translaton of the nput sentence s generated by referencng the set of generatve target sentental patterns that were selected prevously. In each node of the parse tree, there s an assocated target sentental pattern, whch s used to generate the correspondng translaton. Moreover, n order to ensure that the system generates perfectly the translaton n Portuguese grammatcally, we employ unfcaton of Functonal descrptors (FD) as a valdaton operaton for each node. Snce AVMs for each node was constructed for each consttuent node n the parsng stage, these wll be reused durng the generaton phase. Snce most of the Portuguese words defned n FDs are n ther orgnal word-form, they need to be changed based on a set of grammatcal agreement rules. Thus, extra FDs wll be added accordngly to the AVM, dependng on ts part-of-speech, for checkng the dependency between Portuguese words n order to generate the target translaton correctly. These extra attrbutes nclude number, gender, tense, and categores of person. As an example, consder the parse tree of the sentence 她把兩支鋼筆借給了佩德羅 shown n Fgure 4. NP1 pro 她 PP p 把 num 兩 S NP2 {num NP4} q 支 NP4 {NP1 VP a NP3 NP2} NP5 Fgure 4. Example of a parse tree Suppose that the translaton of the noun phrase 兩支鋼筆 /NP2 (two pens), wth the target pattern num NP4, s gong to be generated. The meanngs obtaned from the blngual dctonary of the words 兩 (two) and 鋼筆 n 鋼筆 {q NP5} VP v 借給了 NP3 npr 佩德羅 (pen) are dos and caneta respectvely. Moreover, FDs of 兩 and 支鋼筆, and ther related nformaton are shown below. c-lexcal = 兩 category = q p-lexcal = dos FD1 = Fgure 5. AVMs of the words 兩 and 支鋼筆 Unfcaton of FD1 and FD2 wll fal because the gender and the number are dfferent. In such a case, necessary conversons are performed so that FD1 and FD2 wll be compatble wth each other. Therefore, the generated result for 兩支鋼筆 s duas canetas (two pens). Smlarly, snce the verb phrase 借給了 /VP (lent) must be n agreement wth NP1 and t must have a correct tense, the Portuguese word emprestar (to lend) should be converted to emprestou (thrd person n past tense). Besdes unfcaton, artcles may need to be restored for each noun phrase s necessary. For example, the noun phrase 佩德羅 /NP3 (Peter) needs to add an artcle o before the Portuguese word Pedro. After all the unfcatons and artcle restoratons, the sentence becomes Ela emprestou a o Pedro duas canetas. However, the generated sentence s stll not totally correct. It s manly because some words can be contracted n the Portuguese grammar. In ths case, the preposton a and the artcle o should be contracted as one word ao. Thus, an extra module that checks f there s a need for contractons s called at last, and the output of the generaton module s Ela emprestou ao Pedro duas canetas (She lent two pens to Peter). 6. Concluson gender = male number = plural c-lexcal = 支鋼筆 category = NP4 p-lexcal = caneta FD2 = gender = female number = sngular In ths paper, we proposed Chnese-Portuguese query translaton for CLIR based on a MT system that parses Constrant Synchronous Grammar. In ths archtecture, based on the gven Chnese sentence, a set of rough segmented results s generated and after taggng all of these canddate sentences, the one wth the hghest score wll be selected. The sentence s then parsed for nferrng the syntactc structure based on Constrant-based Synchronous Grammars. Unlke transfer-based MT archtectures where the translaton process s carred out n sequence by dfferent analytcal phases, by parsng CSG rules, the correspondng target sentental pattern can be nferred

6 mmedately, so that our approach can reduce nformaton loss durng the transfer process. After constructng the parse tree, t s used for generatng the translaton wth the assstance of the unfcaton between functonal descrptors defned to guarantee the correctness of the grammar and the qualty of the translaton. The proposed MT model can remove dfferent types of ambguty at dfferent stages for enhancng the qualty of the translaton: the creaton of word boundares n the segmentaton module removes ambguty between Chnese words; Part-of-speech ambguty s removed by probablstc tagger; structural ambgutes n parse trees can be removed by parsng CSG; and lexcal ambgutes, where words may have more than one meanng, usually referred as the problem of word sense dsambguaton, can be solved through CSG parsng through the analyss of surrounded neghbors of the ambguous word n queston. Acknowledgements The research work reported n ths paper was supported by Fundo para o Desenvolvmento das Cêncas e da Tecnologa (Scence and Technology Development Fund) under grant 041/2005/A and t was also supported by Research Commttee of Unversty of Macau under grant CATIVO:2372. References [1] Ballesteros, L., Croft, W. B., "Dctonary-based Methods for Cross-Lngual Informaton Retreval", Proceedngs of the 7th Internatonal DEXA Conference on Database and Expert Systems Applcatons, pp [2] Bennett, W. and Slocum, J., The LRC Machne Translaton System, Computatonal Lngustcs, Vol. 11, No. 2-3, pp , [3] Peter F. Brown, John Cocke, Stephen Della Petra, Vncent J. Della Petra, Frederck Jelnek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossn, A Statstcal Approach to Machne Translaton, Computatonal Lngustcs, Vol. 16, No. 2, pp , [4] Peter F. Brown, Stephen A. Della Petra, Vncent J. Della Petra, and Robert Mercer, The mathematcs of statstcal machne translaton: parameter estmaton, Computatonal Lngustcs, Vol. 19, No. 2, pp , [5] Ralf D. Brown, Example-Based machne translaton n the pangloss system, Proceedngs of the 16th Internatonal Conference on Computatonal Lngustcs (COLING-96), Copenhagen, Denmark, pp , [6] Melamed I. D., Multtext Grammars and Synchronous Parsers, Proceedngs of NAACL/HLT 2003, Edmonton, pp , [7] S. M. Sheber and Y. Schabes. Synchronous Tree-Adjonng Grammars, Proceedngs of the 13th Internatonal Conference on Computatonal Lngustcs (COLING-90), Helsnk, Fnland, pp , [8] A. Abellé, Y. Schabes, A. Josh, Usng lexcalzed TAGs for machne translaton, Proceedngs 13rd Internatonal Conference on Computatonal Lngustcs (COLING-90), Helsnk, Fnland, Vol. 3, pp. 1-7, [9] Sek, H., Matsumura, T., Fuj, M., Kasam, T., On multple context-free grammars, Theoretcal Computer Scence, Vol. 88, No. 2, pg , [10] Wong F., Hu D. C., Mao Y. H., Dong M. C., and L Y. P., Machne Translaton Based on Constrant-Based Synchronous Grammar, Proceedngs of the Second Internatonal Jont Conference on Natural Language (IJCNLP-05), Vol. 3651, Jeju Island, Republc of Korea, pp , [11] Tomta, M., An effcent augmented-context-free parsng algorthm, Computatonal Lngustcs, Vol. 13, No. 1-2, pp , [12] Zhang HP, Lu Q., Model of Chnese words rough segmentaton based on N-shortest-paths method, Journal of Chnese Informaton Processng, Vol. 16, No. 5, pp. 1-7, [13] Leong K. S., Wong F., Tang C. W., and Dong M. C., CSAT: A Chnese Segmentaton and Taggng Module Based on the Interpolated Probablstc Model, Proceedngs n Computatonal Methods n Engneerng and Scence (EPMESC-X), Sanya, Hanan, Chna, pp , [14] Rabner L., A tutoral on hdden Markov models and selected applcatons n speech recognton, Proceedngs of the IEEE, Vol. 77, No. 2, pp , [15] K. Ronald, The Formal Archtecture of Lexcal-Functonal Grammar, Journal of Informaton Scence and Engneerng, Vol. 5, pp , [16] Early J., An Effcent Context-Free Parsng Algorthm, Communcatons of the Assocaton for Computng Machnery, Vol. 13, No. 2, pp , 1970.

Identifying Intention Posts in Discussion Forums

Identifying Intention Posts in Discussion Forums Identfyng Intenton Posts n Dscusson Forums Zhyuan Chen, Bng Lu Department of Computer Scence Unversty of Illnos at Chcago Chcago, IL 60607, USA czyuanacm@gmal.com,lub@cs.uc.edu Mechun Hsu, Malu Castellanos,

More information

Semantic Inference at the Lexical-Syntactic Level

Semantic Inference at the Lexical-Syntactic Level Semantc Inference at the Lexcal-Syntactc Level Roy Bar-Ham and Ido Dagan Computer Scence Department Bar-Ilan Unversty Ramat-Gan 52900, Israel {barhar, dagan}@cs.bu.ac.l Iddo Greental Lngustcs Department

More information

Semantic Inference at the Lexical-Syntactic Level

Semantic Inference at the Lexical-Syntactic Level Semantc Inference at the Lexcal-Syntactc Level Roy Bar-Ham and Ido Dagan Computer Scence Department Bar-Ilan Unversty Ramat-Gan 52900, Israel {barhar, dagan}@cs.bu.ac.l Iddo Greental Lngustcs Department

More information

Reinforcement Learning-based Feature Selection For Developing Pedagogically Effective Tutorial Dialogue Tactics

Reinforcement Learning-based Feature Selection For Developing Pedagogically Effective Tutorial Dialogue Tactics Renforcement Learnng-based Feature Selecton For Developng Pedagogcally Effectve Tutoral Dalogue Tactcs 1 Introducton Mn Ch, Pamela Jordan, Kurt VanLehn, Moses Hall {mc31, pjordan+, vanlehn+, mosesh}@ptt.edu

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe 1 CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe 2 Algorthm Effcency SORTING 3 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may

More information

TEACHING SPEAKING USING THE INFORMATION GAP TECHNIQUE. By Dewi Sartika * ABSTRACT

TEACHING SPEAKING USING THE INFORMATION GAP TECHNIQUE. By Dewi Sartika * ABSTRACT TEACHING SPEAKING USING THE INFORMATION GAP TECHNIQUE By Dew Sartka * Unversty of Syah Kuala, Banda Aceh ABSTRACT Ths study was amed at fndng out f there would be a sgnfcant dfference n achevement between

More information

Non-Profit Academic Project, developed under the Open Acces Initiative

Non-Profit Academic Project, developed under the Open Acces Initiative Red de Revstas Centífcas de Amérca Latna, el Carbe, España y Portugal Sstema de Informacón Centífca Eduardo Islas, Mguel Pérez, Gullermo Rodrguez, Israel Paredes, Ivonne Ávla, Mguel Mendoza E-learnng Tools

More information

The Differential in Earnings Premia Between Academically and Vocationally Trained Males in the United Kingdom

The Differential in Earnings Premia Between Academically and Vocationally Trained Males in the United Kingdom ISSN 2045-6557 The Dfferental n Earnngs Prema Between Academcally and Vocatonally Traned Males n the Unted Kngdom Gavan Conlon June 2001 Publshed by Centre for the Economcs of Educaton London School of

More information

In ths paper we want to show that the possble analyses of ths problem wthn the framework of PSG are lmted by combnatons of the followng basc assumpton

In ths paper we want to show that the possble analyses of ths problem wthn the framework of PSG are lmted by combnatons of the followng basc assumpton On Non-ead Non-Movement In: G. Goerz (ed.): KONVNS 9, Sprnger Verlag, 99, pp 8- Klaus Netter eutsches Forschungszentrum fur Kunstlche Intellgenz Gmb Stuhlsatzenhausweg, -00 Saarbrucken, Germany e-mal:

More information

Available online at Procedia Economics and Finance 2 ( 2012 )

Available online at  Procedia Economics and Finance 2 ( 2012 ) Avalable onlne at www.scencedrect.com Proceda Economcs and Fnance 2 ( 2012 ) 353 362 2nd Annual Internatonal Conference on Qualtatve and Quanttatve Economcs Research Abstract (QQE 2012) A Survey of Tha

More information

Factors Affecting Students' Performance. 16 July 2006

Factors Affecting Students' Performance. 16 July 2006 Factors Affectng Students' Performance Nasr Harb 1 * Department of Economcs College of Busness & Economcs Unted Arab Emrates Unversty P.O. Box 17555 Al-An, UAE Tel.: 971 3 7133228 Fax: 971 3 7624384 E-mal:

More information

Improvement of Text Dependent Speaker Identification System Using Neuro-Genetic Hybrid Algorithm in Office Environmental Conditions

Improvement of Text Dependent Speaker Identification System Using Neuro-Genetic Hybrid Algorithm in Office Environmental Conditions IJCSI Internatonal Journal of Computer Scence Issues, Vol. 1, 2009 ISSN (Onlne): 1694-0784 ISSN (Prnt): 1694-0814 42 Improvement of Text Dependent Speaker Identfcaton System Usng Neuro-Genetc Hybrd Algorthm

More information

Efficient Estimation of Time-Invariant and Rarely Changing Variables in Finite Sample Panel Analyses with Unit Fixed Effects

Efficient Estimation of Time-Invariant and Rarely Changing Variables in Finite Sample Panel Analyses with Unit Fixed Effects Effcent Estmaton of Tme-Invarant and Rarely Changng Varables n Fnte Sample Panel Analyses wth Unt Fxed Effects Thomas Plümper and Vera E. Troeger Date: 24.08.2006 Verson: trc_80 Unversty of Essex Department

More information

Scenario Development Approach to Management Simulation Games

Scenario Development Approach to Management Simulation Games do: 10.1515/tms-2014-0022 2014 / 17 Scenaro Development Approach to Management Smulaton Games Jana Bkovska, Rga Techncal Unversty Abstract The paper ntroduces a scenaro development approach to management

More information

Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change

Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change Cultural Shft or Lngustc Drft? Comparng Two Computatonal Measures of Semantc Change Wllam L. Hamlton, Jure Leskovec, Dan Jurafsky Department of Computer Scence, Stanford Unversty, Stanford CA, 94305 wlef,jure,jurafsky@stanford.edu

More information

intellect edison.dadeschools.net i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i College board academy

intellect edison.dadeschools.net i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i College board academy Edson where nventon begns mathematcs scence languages socal studes dual enrollment electves ntellect College board academy The College Board Academy offers an array of hgh-level courses for students who

More information

A MULTIOBJECTIVE OPTIMIZATION FOR THE EWMA AND MEWMA QUALITY CONTROL CHARTS

A MULTIOBJECTIVE OPTIMIZATION FOR THE EWMA AND MEWMA QUALITY CONTROL CHARTS Invese Poblems, Desgn and Optmzaton Symposum Ro de Janeo, Bazl, 24 A MULTIOBJECTIVE OPTIMIZATION FOR THE EWMA AND MEWMA QUALITY CONTROL CHARTS Fancsco Apas Depatamento de Estadístca e Investgacón Opeatva

More information

Long Distance Wh-movement in Seereer: Implications for Intermediate Movement

Long Distance Wh-movement in Seereer: Implications for Intermediate Movement Long Dstance Wh-movement n Seereer: Implcatons for Intermedate Movement Nco Baer U Berkeley nbbaer@berkeley.edu PL 38 March 29, 2014 1 Introducton Queston: What motvates ntermedate movement n a successve-cyclc

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

Alignment USING COURSE, PROGRAM, AND INSTITUTIONAL CURRICULAR MAPS FOR ALIGNMENT

Alignment USING COURSE, PROGRAM, AND INSTITUTIONAL CURRICULAR MAPS FOR ALIGNMENT Algnment USNG COURSE PROGRAM AND NSTTUTONAL CURRCULAR MAPS FOR ALGNMENT "Mappng" s ntentonal1y desgned to capture or nvestgate currcular coherence by eplorng the algnment between learnng outcomes courses

More information

vision values& Values & Vision NEVADA ARTS COUNCIL STRATEGIC PLAN A society is reflected by the state of its arts. All of Nevada deserves

vision values& Values & Vision NEVADA ARTS COUNCIL STRATEGIC PLAN A society is reflected by the state of its arts. All of Nevada deserves values& vson A socety s reflected by the state of ts arts. All of Nevada deserves access to the arts. NEVADA ARTS COUNCIL STRATEGIC PLAN 2004-2007 Values & Vson The Nevada Arts Councl s a dvson of the

More information

1 st HALF YEARLY MONITORING REPORT OF (JAMIA MILLIA ISLAMIA) on MDM for the State of UTTAR PRADESH

1 st HALF YEARLY MONITORING REPORT OF (JAMIA MILLIA ISLAMIA) on MDM for the State of UTTAR PRADESH Annexure V 1 st HALF YEARLY MONITORING REPORT OF (JAMIA MILLIA ISLAMIA) on MDM for the State of UTTAR PRADESH Perod: 1 st Aprl 2013 to 30 st September 2013 Dstrcts Covered 1. BARABANKI 2. LUCKNOW 3. SANT

More information

A Training Manual for Educators K16

A Training Manual for Educators K16 A Tranng Manual for Educators K16 ArzonA 2010 Servce for a Lfe Tme A Tranng Manual for Educators K16 Edted by Debb Bertolet, Joan Brd, and Sar Nms Funded by State Farm Learn and Serve Arzona Mesa Publc

More information

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.

More information

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together

More information

Some Principles of Automated Natural Language Information Extraction

Some Principles of Automated Natural Language Information Extraction Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up

More information

Context Free Grammars. Many slides from Michael Collins

Context Free Grammars. Many slides from Michael Collins Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures

More information

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy Informatics 2A: Language Complexity and the Chomsky Hierarchy September 28, 2010 Starter 1 Is there a finite state machine that recognises all those strings s from the alphabet {a, b} where the difference

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions. to as a linguistic theory to to a member of the family of linguistic frameworks that are called generative grammars a grammar which is formalized to a high degree and thus makes exact predictions about

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

Correspondence between the DRDP (2015) and the California Preschool Learning Foundations. Foundations (PLF) in Language and Literacy

Correspondence between the DRDP (2015) and the California Preschool Learning Foundations. Foundations (PLF) in Language and Literacy 1 Desired Results Developmental Profile (2015) [DRDP (2015)] Correspondence to California Foundations: Language and Development (LLD) and the Foundations (PLF) The Language and Development (LLD) domain

More information

Developing a TT-MCTAG for German with an RCG-based Parser

Developing a TT-MCTAG for German with an RCG-based Parser Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,

More information

Chapter 4: Valence & Agreement CSLI Publications

Chapter 4: Valence & Agreement CSLI Publications Chapter 4: Valence & Agreement Reminder: Where We Are Simple CFG doesn t allow us to cross-classify categories, e.g., verbs can be grouped by transitivity (deny vs. disappear) or by number (deny vs. denies).

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

Extracting and Ranking Product Features in Opinion Documents

Extracting and Ranking Product Features in Opinion Documents Extracting and Ranking Product Features in Opinion Documents Lei Zhang Department of Computer Science University of Illinois at Chicago 851 S. Morgan Street Chicago, IL 60607 lzhang3@cs.uic.edu Bing Liu

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

ScienceDirect. Malayalam question answering system

ScienceDirect. Malayalam question answering system Available online at www.sciencedirect.com ScienceDirect Procedia Technology 24 (2016 ) 1388 1392 International Conference on Emerging Trends in Engineering, Science and Technology (ICETEST - 2015) Malayalam

More information

Compositional Semantics

Compositional Semantics Compositional Semantics CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Words, bag of words Sequences Trees Meaning Representing Meaning An important goal of NLP/AI: convert natural language

More information

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,

More information

Accurate Unlexicalized Parsing for Modern Hebrew

Accurate Unlexicalized Parsing for Modern Hebrew Accurate Unlexicalized Parsing for Modern Hebrew Reut Tsarfaty and Khalil Sima an Institute for Logic, Language and Computation, University of Amsterdam Plantage Muidergracht 24, 1018TV Amsterdam, The

More information

knarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese

knarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese knarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese Adriano Kerber Daniel Camozzato Rossana Queiroz Vinícius Cassol Universidade do Vale do Rio

More information

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas Exploiting Distance Learning Methods and Multimediaenhanced instructional content to support IT Curricula in Greek Technological Educational Institutes P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou,

More information

An Efficient Implementation of a New POP Model

An Efficient Implementation of a New POP Model An Efficient Implementation of a New POP Model Rens Bod ILLC, University of Amsterdam School of Computing, University of Leeds Nieuwe Achtergracht 166, NL-1018 WV Amsterdam rens@science.uva.n1 Abstract

More information

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen UNIVERSITY OF OSLO Department of Informatics Dialog Act Recognition using Dependency Features Master s thesis Sindre Wetjen November 15, 2013 Acknowledgments First I want to thank my supervisors Lilja

More information

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG Dr. Kakia Chatsiou, University of Essex achats at essex.ac.uk Explorations in Syntactic Government and Subcategorisation,

More information

Grammars & Parsing, Part 1:

Grammars & Parsing, Part 1: Grammars & Parsing, Part 1: Rules, representations, and transformations- oh my! Sentence VP The teacher Verb gave the lecture 2015-02-12 CS 562/662: Natural Language Processing Game plan for today: Review

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

An Interactive Intelligent Language Tutor Over The Internet

An Interactive Intelligent Language Tutor Over The Internet An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This

More information

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 1 CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 Peter A. Chew, Brett W. Bader, Ahmed Abdelali Proceedings of the 13 th SIGKDD, 2007 Tiago Luís Outline 2 Cross-Language IR (CLIR) Latent Semantic Analysis

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

"f TOPIC =T COMP COMP... OBJ

f TOPIC =T COMP COMP... OBJ TREATMENT OF LONG DISTANCE DEPENDENCIES IN LFG AND TAG: FUNCTIONAL UNCERTAINTY IN LFG IS A COROLLARY IN TAG" Aravind K. Joshi Dept. of Computer & Information Science University of Pennsylvania Philadelphia,

More information

Radius STEM Readiness TM

Radius STEM Readiness TM Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and

More information

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Jakub Waszczuk, Agata Savary To cite this version: Jakub Waszczuk, Agata Savary. Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]. PARSEME 6th general

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Vocabulary Usage and Intelligibility in Learner Language

Vocabulary Usage and Intelligibility in Learner Language Vocabulary Usage and Intelligibility in Learner Language Emi Izumi, 1 Kiyotaka Uchimoto 1 and Hitoshi Isahara 1 1. Introduction In verbal communication, the primary purpose of which is to convey and understand

More information

Combining a Chinese Thesaurus with a Chinese Dictionary

Combining a Chinese Thesaurus with a Chinese Dictionary Combining a Chinese Thesaurus with a Chinese Dictionary Ji Donghong Kent Ridge Digital Labs 21 Heng Mui Keng Terrace Singapore, 119613 dhji @krdl.org.sg Gong Junping Department of Computer Science Ohio

More information

Resolving Ambiguity for Cross-language Retrieval

Resolving Ambiguity for Cross-language Retrieval Resolving Ambiguity for Cross-language Retrieval Lisa Ballesteros balleste@cs.umass.edu Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst, MA

More information

Short Text Understanding Through Lexical-Semantic Analysis

Short Text Understanding Through Lexical-Semantic Analysis Short Text Understanding Through Lexical-Semantic Analysis Wen Hua #1, Zhongyuan Wang 2, Haixun Wang 3, Kai Zheng #4, Xiaofang Zhou #5 School of Information, Renmin University of China, Beijing, China

More information

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Ulrike Baldewein (ulrike@coli.uni-sb.de) Computational Psycholinguistics, Saarland University D-66041 Saarbrücken,

More information

A heuristic framework for pivot-based bilingual dictionary induction

A heuristic framework for pivot-based bilingual dictionary induction 2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,

More information

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank Dan Klein and Christopher D. Manning Computer Science Department Stanford University Stanford,

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Natural Language Processing. George Konidaris

Natural Language Processing. George Konidaris Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2017 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans

More information

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

The Smart/Empire TIPSTER IR System

The Smart/Empire TIPSTER IR System The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of

More information

Using Moodle in ESOL Writing Classes

Using Moodle in ESOL Writing Classes The Electronic Journal for English as a Second Language September 2010 Volume 13, Number 2 Title Moodle version 1.9.7 Using Moodle in ESOL Writing Classes Publisher Author Contact Information Type of product

More information

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR ROLAND HAUSSER Institut für Deutsche Philologie Ludwig-Maximilians Universität München München, West Germany 1. CHOICE OF A PRIMITIVE OPERATION The

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3 Inleiding Taalkunde Docent: Paola Monachesi Blok 4, 2001/2002 Contents 1 Syntax 2 2 Phrases and constituent structure 2 3 A minigrammar of Italian 3 4 Trees 3 5 Developing an Italian lexicon 4 6 S(emantic)-selection

More information

Memory-based grammatical error correction

Memory-based grammatical error correction Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class If we cancel class 1/20 idea We ll spend an extra hour on 1/21 I ll give you a brief writing problem for 1/21 based on assigned readings Jot down your thoughts based on your reading so you ll be ready

More information

NATURAL LANGUAGE PARSING AND REPRESENTATION IN XML EUGENIO JAROSIEWICZ

NATURAL LANGUAGE PARSING AND REPRESENTATION IN XML EUGENIO JAROSIEWICZ NATURAL LANGUAGE PARSING AND REPRESENTATION IN XML By EUGENIO JAROSIEWICZ A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE

More information

Seminar - Organic Computing

Seminar - Organic Computing Seminar - Organic Computing Self-Organisation of OC-Systems Markus Franke 25.01.2006 Typeset by FoilTEX Timetable 1. Overview 2. Characteristics of SO-Systems 3. Concern with Nature 4. Design-Concepts

More information

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand 1 Introduction Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand heidi.quinn@canterbury.ac.nz NWAV 33, Ann Arbor 1 October 24 This paper looks at

More information

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION Han Shu, I. Lee Hetherington, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge,

More information

The Discourse Anaphoric Properties of Connectives

The Discourse Anaphoric Properties of Connectives The Discourse Anaphoric Properties of Connectives Cassandre Creswell, Kate Forbes, Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi Λ, Bonnie Webber y Λ University of Pennsylvania 3401 Walnut Street Philadelphia,

More information

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80. CONTENTS FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8 УРОК (Unit) 1 25 1.1. QUESTIONS WITH КТО AND ЧТО 27 1.2. GENDER OF NOUNS 29 1.3. PERSONAL PRONOUNS 31 УРОК (Unit) 2 38 2.1. PRESENT TENSE OF THE

More information

Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform

Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform doi:10.3991/ijac.v3i3.1364 Jean-Marie Maes University College Ghent, Ghent, Belgium Abstract Dokeos used to be one of

More information

Character Stream Parsing of Mixed-lingual Text

Character Stream Parsing of Mixed-lingual Text Character Stream Parsing of Mixed-lingual Text Harald Romsdorfer and Beat Pfister Speech Processing Group Computer Engineering and Networks Laboratory ETH Zurich {romsdorfer,pfister}@tik.ee.ethz.ch Abstract

More information

The College Board Redesigned SAT Grade 12

The College Board Redesigned SAT Grade 12 A Correlation of, 2017 To the Redesigned SAT Introduction This document demonstrates how myperspectives English Language Arts meets the Reading, Writing and Language and Essay Domains of Redesigned SAT.

More information

Writing a composition

Writing a composition A good composition has three elements: Writing a composition an introduction: A topic sentence which contains the main idea of the paragraph. a body : Supporting sentences that develop the main idea. a

More information

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

Team Formation for Generalized Tasks in Expertise Social Networks

Team Formation for Generalized Tasks in Expertise Social Networks IEEE International Conference on Social Computing / IEEE International Conference on Privacy, Security, Risk and Trust Team Formation for Generalized Tasks in Expertise Social Networks Cheng-Te Li Graduate

More information

Patterson, Carter see new county jail in the future

Patterson, Carter see new county jail in the future , k» \ f Patterson, Carter see new county jal n the future SHERIFF PATTERSON Because of leadershp post By NORRIS R. MCDOWELL A new jal for Clnton County? The possblty has been rased by a recent letter

More information

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis International Journal of Arts Humanities and Social Sciences (IJAHSS) Volume 1 Issue 1 ǁ August 216. www.ijahss.com Linguistic Variation across Sports Category of Press Reportage from British Newspapers:

More information