An Entity-Relation Approach to Information Retrieval 1
|
|
- Johnathan Lambert
- 6 years ago
- Views:
Transcription
1 An Entity-Relation Approach to Information Retrieval 1 Antonio Ferrández, Julio Martínez and Jesús Peral Dept. Languages and Information Systems, University of Alicante Carretera San Vicente S/N Alicante, SPAIN {antonio, jmartinez, jperal}@dlsi.ua.es Abstract In this paper, a novel model of indexation in IR is proposed, in order to overcome the problems of traditional bag of words approaches, by means of indexing the entities and the relations between these entities in the documents through the clauses and anaphoric relations. This model has been evaluated on the Los Angeles Times collection. The obtained results have been compared with the vectorial model and an increase of the 12% in the average precision, and an increase of the 13% in the R-Precision have been obtained. 1 Introduction In the literature, the Natural Language Processing (NLP) techniques have been reported to show no significant improvement in retrieval performance, although it looks clear that they may overcome the inadequacies of purely quantitative methods of text Information Retrieval (IR): statistical full-text retrieval or bag of words representations. As examples of the attempts to overcome these inadequacies, the works from Strzalkowski can be read (e.g. Strzalkowski, 1999). As they say, one possible explanation is that the syntactic analysis is just not going far enough. Or perhaps more appropriately, that the semantic uniformity predictions made on the basis of syntactic structures are less reliable than we have hoped for. Of course the relatively low quality of parsing may be a major problem, although there is little evidence to support that. In this paper, we propose a novel IR model that incorporates NLP techniques such as POStagging and partial parsing to improve the traditional bag of words representations. This model indexes entities and the relations between these entities. These relations are based on the clause splitting of the document, and the resolution of anaphora phenomenon between these entities. In this way, we improve other approaches that use this kind of knowledge, such as Zhai et al. (1996) work, in which only sets of nouns and/or adjectives are indexed thorough the vector space retrieval model, because these relations between entities are not considered. In the following section, the model proposed in this paper is presented in its intuitive view. This is followed by its implementation in a computational system, which is finally evaluated on the Los Angeles Times collection and compared with the vectorial model. 2 The intuitive model The model proposed in this paper tries to overcome the problems of traditional bag of words approaches, by means of extracting the entities in the documents. The entities are obtained from the syntactic knowledge of the document, i.e., the complex noun phrases (NP) that are going to be parsed (the NPs should be complex enough, in order to capture all the 1 This paper has been partially supported by the Spanish Government (CICYT) project number TIC C02-02 and (PROFIT) project number FIT Copyright held by the author 285
2 information from each entity, i.e., they could be formed by relative clauses, appositions, coordinated PPs and coordinated adjectives). These NPs interact with each other by means of clauses, whose main head is the verb, as it is shown in Figure 1. In this Figure, the sentence (1) is represented by means of four entities: Peter s son, Peter, the garden and flowers; where these entities interact with each other by means of two clauses whose heads are stay and catch respectively. In the clause 1, the entity 1 appears as the agent 2 or subject, and the entity 3 as the modifier since it is included in a prepositional phrase (PP). (1) Peter s son stayed in the garden. He was catching flowers. ENTITY 1 Identifier: V Head: son sing, masc,third MODIFIER: Peter ENTITY 2 Identifier: X Head: Peter sing, masc, third CLAUSE 1 Sentence_ID: 1 ACTION: stay AGENT: V MODIFIER: Cat: PP, Id: Q, Prep: in ENTITY:Y COREFERENCE CHAIN ENTITY 3 Identifier: Y Head: garden sing, masc, third CLAUSE 2 Sentence_ID: 2 ACTION: catch THEME: Z AGENT: Cat: PRONOUN, Type, Num, Gend Person, Head ENTITY:V ENTITY 4 Identifier: Z Head: flower plural, fem, third Figure 1. IRS Intuitive model: entities vs. clauses in sentence (1). Moreover, these entities interact with other entities by means of anaphora phenomenon, which is defined by Hirst (1981) as the device, in discourse, of making an abbreviated reference to some entity or entities, in the expectation that the receiver of the discourse will be able to dis-abbreviate it and determine the identity of the entity. For example, in Figure 1, the pronoun he allows that the entity 1 interacts with the entity 4 through the verb of the clause 2. The anaphoric relations between entities also allow capturing more information about the entities themselves. For example, let us suppose that the sentence (2) occurs after the (1). In 2 In Figure 1, the clauses store the semantic roles: ACTION, AGENT, THEME and MODIFIER that correspond to verb, subject, object and prepositional phrases of the clause respectively. 286
3 this case, the information that Peter is Jane s husband is added to the previous information of the entity 2. (2) Peter, Jane s husband, called his son. The model proposed in this paper overcomes the drawbacks of the bag of words approaches, because it does not index independent words, but entities and their relations. In this way, our approach also overcomes other IR approaches that use NLP, because we do not index just contiguous words as pairs, ternary expressions or phrases, but we index whole entities by adding the new information that is presented in different points of the document, by means of resolving anaphora. Therefore, if a query asks for information about Peter as husband of Susan, this document will not be returned. 3 The implementation of the intuitive model In order to implement the model proposed in the previous section, we have worked on the output of the computational system called Slot Unification Parser for Anaphora Resolution (SUPAR). This system, which was presented in Ferrández et al. (1999), resolves anaphora in both English and Spanish texts, although it can be easily extended to other languages 3. SUPAR works on the output of a POS tagger, and partial parses the text. SUPAR partial parses coordinated NPs, coordinated PPs, verbal phrases and conjunctions, where NPs can include relative clauses, appositions, coordinated PPs and coordinated adjectives. Conjunctions are used to split sentences into clauses. An example of the parsing process and the detection of noun phrase entities in a sentence can be observed in (3), where 10 entities have been extracted. (3) [[David R. Marples s] 1 new book, his second on [the Chernobyl accident of [April 26, 1986] 2 ] 3 ] 4, is [a shining example of [the best type of [non-soviet analysis into [topics] 5 ] 6 ] 7 ] 8 that only recently were [absolutely taboo in [Moscow official circles] 9 ] 10. The output of SUPAR is stored in three tables: ER, PP and CC. The first one stores the entities and relations between entities in the document, where each entity corresponds to a noun phrase, and each relation corresponds to a clause whose head is a verb. The second one stores the entities that appear in a prepositional phrase, jointly with the preposition, and finally, the third one stores the clauses represented in Figure 1. In Table 1, the representation of the sentence (1) is shown, where each table as well as the document identification, also stores the frequency of each entity, e.g. the frequency of the son entity is 2 because it appears as the subject of the clause 1 and 2 (due to the pronoun resolution), whereas the frequency of the remaining entities is 1. As well as the frequency of each entity, the number of documents in which the entity appears is also stored. With regard to the ER table, when the sentence (2) occurs, due to the anaphora resolution process, the modifiers Jane s husband are added to the entity Peter. Therefore, this entity remains as Peter [husband, Jane], and its frequency is assigned 2. However, if the entity John s son appears in the document, then a new entity with the head son is indexed, because the modifiers do not match. In Table 1, the record son will store two different entities: [[Peter], [John]], although there is only one frequency for both entities, i.e., its frequency is assigned equal to 2. Let us suppose that the sentence Peter s son is black, since the verb of the clause is copulative, then a new characteristic of the entity Peter s son is added (black), so 3 The SUPAR system can be tested in It resolves English pronominal anaphora with a 74% of success rate, and Spanish pronominal anaphora with an 81%. 287
4 the modifiers of son entity remains as [[black, Peter], [John]]. Therefore, as conclusion, whenever the modifiers of the new entity are included in a list of modifiers of an entity previously stored in the table, then the new modifiers are added to that list. Otherwise, a new list of modifiers is stored. ER PP CC Head Modifiers Preposition NP Verb Subject Objects Head NP Head son [Peter] in garden stayed son [garden] Peter [] catching son [flowers] garden [] flowers [] stayed [garden, Peter, son] catching [flowers, Peter, son] Table 1. Tables used to represent the entities in the sentence (1). With reference to the table PP, only the head of the NP is stored, and when there are several heads, a new entry is stored. For example, in the PP for books and cigars, two entries are stored: for books and for cigars. In the table CC, the omitted subjects are also detected due to the clause splitting, as in Ross carefully folded his trousers and climbed into bed, where Ross is also included as the subject of the second clause. In this way, different tree-structures are normalized into the same entity, e.g. Chinese communist invasion, invasion of communist Chineses, invasion of communists of China, invasion of communists that are from China, invasion of communists that are Chinese, invasion of Chinese communists, and invasion of Chineses that are communists, which are conflated in the entity invasion [Chin--, communist]. Moreover, due to the anaphora resolution process, when the entities appear separately, this relation can also be captured. The user s query is processed in a similar way, and the three tables ER, PP and CC which are obtained, are compared with the tables for each document. Therefore, each table is used as the vectorial model. As the similarity measure, the one proposed in Kaszkiel et al. (1999) is used, although this measure is improved in the ER table, where the traditional vectorial weights are multiplied by the factors (F) in Table 2. These factors depend on the list of modifiers of the entity stored in the table (MT), the modifiers of the entity that appears in the user s query (MU) and the number of common modifiers between both lists (Common). MT=table modifiers F=Factor MU= user s query modifiers MT MU = [] F = 1.3 MT = [] F = 0 MU = [] F = 2.1 (MU MT) (MU MT []) F = 2.2 * log(common+1) (MT MU) (MU MT []) F = 1.6 * log(common+1) (Common 0) (MU MT) (MT F = 1.4 * log(common+1) MU) Common = 0 F = 1.1 Table 2. Factors in the table ER. 288
5 4 Evaluation Several experiments have been carried out to measure the improvement of our proposal with regard to the vectorial model as proposed in Kaszkiel et al. (1999). We have worked with the Cross Lingual Evaluation Forum (CLEF) queries; specifically we have used queries from 41 to 90 for the evaluation results presented in this section, whereas we have used the remaining queries for the training of the system, in order to obtain the factors of Table 2. The corpus on which these experiments were carried out is the Los Angeles Times collection, that is, a set of 113,005 papers from the English newspaper (approximately 425 Mb). For each query, 1,000 documents are returned. Finally, we have only used the short version of these queries (i.e., the title and description fields). During the training process, the best factor values were obtained, specifically those in Table 2. Moreover, these best results were obtained when the stem of the lemma was used, where the lemma was obtained from the Tree-Tagger 4. Precision vs. Recall 0,70 ER PP CC Vect orial 0,60 0,50 0,40 0,30 0,20 0,10 0, Recall Figure 2 The obtained results are shown in Figure 2 and Figure 3. In Figure 2, the interpolated recall-precision averages obtained with each independent table (ER, PP, CC) in comparison with the vectorial model are shown. It can be observed that only the ER table obtains better results than the vectorial model, specifically an increase of the 12% in the average precision, and an increase of the 13% in the R-Precision. The PP and CC tables always obtain low results. Therefore, we should improve them in order to obtain better results when they are used jointly with the ER table. In Figure 3, the precision at N documents is shown when we are using just the ER table; it should be remarked the improvement with reference to the vectorial model, when only 5 documents are returned
6 0,45 0,40 0,35 0,30 0,25 0,20 0,15 0,10 0,05 0,00 Precision.vs. N docs Vectorial IRS 5 docs 10 docs 15 docs 20 docs 30 docs Figure 3 5 Conclusion In this paper, we have proposed a novel model of indexation in IR. This model tries to overcome the problems of traditional bag of words approaches, by means of indexing the entities and the relations between these entities in the documents. The entities are obtained from the partial parsing of the document. These entities interact with each other by means of clauses and anaphoric relations. In the implementation of this model, we have used three tables: ER, PP and CC, in which we have stored the entities and their relations, the prepositional phrases, and the clauses of the documents. These tables are used in a similar way to the traditional vectorial model, by means of using the similarity measure proposed in Kaszkiel et al. (1999). This model has been evaluated on the short CLEF queries (the queries from 41 to 90 for the evaluation process, where the previous ones for the training process) and on the Los Angeles Times collection. The obtained results have been compared with the vectorial model and an increase of the 12% in the average precision, as well an increase of the 13% in the R- Precision have been obtained when we only use the ER table. As a future project, the authors will try to improve the working of the PP and CC tables in order to combine them with the ER table. Moreover, we expect to evaluate this model on the narrative version of these queries, where we expect to obtain better results due to their length (i.e., a greater number of clauses and noun phrases), because long and descriptive queries usually responded well to NLP, while terse one-sentence search directives showed hardly any improvement. 6 References Chengxiang Zhai, Xiang Tong, Natasa Milic-Frayling, David A. Evans (1996). Evaluation of Syntactic Phrase Indexing - CLARIT NLP Track Report. In Proceedings of the Fifth Text REtrieval Conference (TREC-5). Ferrández, A., Palomar, M. and Moreno, L. (1999). An empirical approach to Spanish anaphora resolution. Machine Translation, 14(3/4), Hirst, G. (1981). Anaphora in Natural Language Understanding. Berlin: Springer-Verlag. Kaszkiel, M., Zobel, J., Sacks-Davis, R. (1999). Efficient passage ranking for document databases. ACM Transactions of Information Systems. 17(4), Strzalkowski, T. (1999). Natural Language Information Retrieval. Kluwer Academic Publishers. 290
Cross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationLanguage Independent Passage Retrieval for Question Answering
Language Independent Passage Retrieval for Question Answering José Manuel Gómez-Soriano 1, Manuel Montes-y-Gómez 2, Emilio Sanchis-Arnal 1, Luis Villaseñor-Pineda 2, Paolo Rosso 1 1 Polytechnic University
More informationThe Smart/Empire TIPSTER IR System
The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of
More informationMULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY
MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract
More informationIntroduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.
to as a linguistic theory to to a member of the family of linguistic frameworks that are called generative grammars a grammar which is formalized to a high degree and thus makes exact predictions about
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationAQUA: An Ontology-Driven Question Answering System
AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationCS 598 Natural Language Processing
CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@
More informationA Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many
Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationInteractive Corpus Annotation of Anaphor Using NLP Algorithms
Interactive Corpus Annotation of Anaphor Using NLP Algorithms Catherine Smith 1 and Matthew Brook O Donnell 1 1. Introduction Pronouns occur with a relatively high frequency in all forms English discourse.
More informationChapter 4: Valence & Agreement CSLI Publications
Chapter 4: Valence & Agreement Reminder: Where We Are Simple CFG doesn t allow us to cross-classify categories, e.g., verbs can be grouped by transitivity (deny vs. disappear) or by number (deny vs. denies).
More informationScienceDirect. Malayalam question answering system
Available online at www.sciencedirect.com ScienceDirect Procedia Technology 24 (2016 ) 1388 1392 International Conference on Emerging Trends in Engineering, Science and Technology (ICETEST - 2015) Malayalam
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationTHE VERB ARGUMENT BROWSER
THE VERB ARGUMENT BROWSER Bálint Sass sass.balint@itk.ppke.hu Péter Pázmány Catholic University, Budapest, Hungary 11 th International Conference on Text, Speech and Dialog 8-12 September 2008, Brno PREVIEW
More informationELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit
Unit 1 Language Development Express Ideas and Opinions Ask for and Give Information Engage in Discussion ELD CELDT 5 EDGE Level C Curriculum Guide 20132014 Sentences Reflective Essay August 12 th September
More informationParsing of part-of-speech tagged Assamese Texts
IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal
More informationLinguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis
International Journal of Arts Humanities and Social Sciences (IJAHSS) Volume 1 Issue 1 ǁ August 216. www.ijahss.com Linguistic Variation across Sports Category of Press Reportage from British Newspapers:
More informationUsing Semantic Relations to Refine Coreference Decisions
Using Semantic Relations to Refine Coreference Decisions Heng Ji David Westbrook Ralph Grishman Department of Computer Science New York University New York, NY, 10003, USA hengji@cs.nyu.edu westbroo@cs.nyu.edu
More informationConstructing Parallel Corpus from Movie Subtitles
Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing
More informationBULATS A2 WORDLIST 2
BULATS A2 WORDLIST 2 INTRODUCTION TO THE BULATS A2 WORDLIST 2 The BULATS A2 WORDLIST 21 is a list of approximately 750 words to help candidates aiming at an A2 pass in the Cambridge BULATS exam. It is
More informationCompositional Semantics
Compositional Semantics CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Words, bag of words Sequences Trees Meaning Representing Meaning An important goal of NLP/AI: convert natural language
More informationProject in the framework of the AIM-WEST project Annotation of MWEs for translation
Project in the framework of the AIM-WEST project Annotation of MWEs for translation 1 Agnès Tutin LIDILEM/LIG Université Grenoble Alpes 30 october 2014 Outline 2 Why annotate MWEs in corpora? A first experiment
More information11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation
tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each
More informationApplications of memory-based natural language processing
Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal
More informationUnderlying and Surface Grammatical Relations in Greek consider
0 Underlying and Surface Grammatical Relations in Greek consider Sentences Brian D. Joseph The Ohio State University Abbreviated Title Grammatical Relations in Greek consider Sentences Brian D. Joseph
More informationPostprint.
http://www.diva-portal.org Postprint This is the accepted version of a paper presented at CLEF 2013 Conference and Labs of the Evaluation Forum Information Access Evaluation meets Multilinguality, Multimodality,
More informationAn Interactive Intelligent Language Tutor Over The Internet
An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationMemory-based grammatical error correction
Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,
More informationBANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS
Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.
More informationWriting a composition
A good composition has three elements: Writing a composition an introduction: A topic sentence which contains the main idea of the paragraph. a body : Supporting sentences that develop the main idea. a
More informationChunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.
NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and
More informationLEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE
LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE Submitted in partial fulfillment of the requirements for the degree of Sarjana Sastra (S.S.)
More informationThe stages of event extraction
The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks
More informationMatching Similarity for Keyword-Based Clustering
Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web
More informationApproaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque
Approaches to control phenomena handout 6 5.4 Obligatory control and morphological case: Icelandic and Basque Icelandinc quirky case (displaying properties of both structural and inherent case: lexically
More informationEnsemble Technique Utilization for Indonesian Dependency Parser
Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id
More informationTarget Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data
Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se
More informationLQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization
LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization Annemarie Friedrich, Marina Valeeva and Alexis Palmer COMPUTATIONAL LINGUISTICS & PHONETICS SAARLAND UNIVERSITY, GERMANY
More information5 th Grade Language Arts Curriculum Map
5 th Grade Language Arts Curriculum Map Quarter 1 Unit of Study: Launching Writer s Workshop 5.L.1 - Demonstrate command of the conventions of Standard English grammar and usage when writing or speaking.
More informationA First-Pass Approach for Evaluating Machine Translation Systems
[Proceedings of the Evaluators Forum, April 21st 24th, 1991, Les Rasses, Vaud, Switzerland; ed. Kirsten Falkedal (Geneva: ISSCO).] A First-Pass Approach for Evaluating Machine Translation Systems Pamela
More informationEnhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion
More informationA Graph Based Authorship Identification Approach
A Graph Based Authorship Identification Approach Notebook for PAN at CLEF 2015 Helena Gómez-Adorno 1, Grigori Sidorov 1, David Pinto 2, and Ilia Markov 1 1 Center for Computing Research, Instituto Politécnico
More informationDeveloping Grammar in Context
Developing Grammar in Context intermediate with answers Mark Nettle and Diana Hopkins PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE The Pitt Building, Trumpington Street, Cambridge, United
More informationProduct Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments
Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &
More informationThe Internet as a Normative Corpus: Grammar Checking with a Search Engine
The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a
More informationResolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Jeju Island, South Korea, July 2012, pp. 777--789.
More informationCROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2
1 CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 Peter A. Chew, Brett W. Bader, Ahmed Abdelali Proceedings of the 13 th SIGKDD, 2007 Tiago Luís Outline 2 Cross-Language IR (CLIR) Latent Semantic Analysis
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationOn document relevance and lexical cohesion between query terms
Information Processing and Management 42 (2006) 1230 1247 www.elsevier.com/locate/infoproman On document relevance and lexical cohesion between query terms Olga Vechtomova a, *, Murat Karamuftuoglu b,
More informationIntra-talker Variation: Audience Design Factors Affecting Lexical Selections
Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and
More informationThe MEANING Multilingual Central Repository
The MEANING Multilingual Central Repository J. Atserias, L. Villarejo, G. Rigau, E. Agirre, J. Carroll, B. Magnini, P. Vossen January 27, 2004 http://www.lsi.upc.es/ nlp/meaning Jordi Atserias TALP Index
More informationBYLINE [Heng Ji, Computer Science Department, New York University,
INFORMATION EXTRACTION BYLINE [Heng Ji, Computer Science Department, New York University, hengji@cs.nyu.edu] SYNONYMS NONE DEFINITION Information Extraction (IE) is a task of extracting pre-specified types
More informationThe College Board Redesigned SAT Grade 12
A Correlation of, 2017 To the Redesigned SAT Introduction This document demonstrates how myperspectives English Language Arts meets the Reading, Writing and Language and Essay Domains of Redesigned SAT.
More informationLanguage Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus
Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,
More informationSummarizing Text Documents: Carnegie Mellon University 4616 Henry Street
Summarizing Text Documents: Sentence Selection and Evaluation Metrics Jade Goldstein y Mark Kantrowitz Vibhu Mittal Jaime Carbonell y jade@cs.cmu.edu mkant@jprc.com mittal@jprc.com jgc@cs.cmu.edu y Language
More informationContext Free Grammars. Many slides from Michael Collins
Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures
More informationNatural Language Processing. George Konidaris
Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2017 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans
More information1. Introduction. 2. The OMBI database editor
OMBI bilingual lexical resources: Arabic-Dutch / Dutch-Arabic Carole Tiberius, Anna Aalstein, Instituut voor Nederlandse Lexicologie Jan Hoogland, Nederlands Instituut in Marokko (NIMAR) In this paper
More informationMulti-Lingual Text Leveling
Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency
More informationCorpus Linguistics (L615)
(L615) Basics of Markus Dickinson Department of, Indiana University Spring 2013 1 / 23 : the extent to which a sample includes the full range of variability in a population distinguishes corpora from archives
More informationUniversal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses
Universal Grammar 1 evidence : 1. crosslinguistic investigation of properties of languages 2. evidence from language acquisition 3. general cognitive abilities 1. Properties can be reflected in a.) structural
More informationBASIC ENGLISH. Book GRAMMAR
BASIC ENGLISH Book 1 GRAMMAR Anne Seaton Y. H. Mew Book 1 Three Watson Irvine, CA 92618-2767 Web site: www.sdlback.com First published in the United States by Saddleback Educational Publishing, 3 Watson,
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationThe Discourse Anaphoric Properties of Connectives
The Discourse Anaphoric Properties of Connectives Cassandre Creswell, Kate Forbes, Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi Λ, Bonnie Webber y Λ University of Pennsylvania 3401 Walnut Street Philadelphia,
More informationCross-Lingual Text Categorization
Cross-Lingual Text Categorization Nuria Bel 1, Cornelis H.A. Koster 2, and Marta Villegas 1 1 Grup d Investigació en Lingüística Computacional Universitat de Barcelona, 028 - Barcelona, Spain. {nuria,tona}@gilc.ub.es
More informationConstruction Grammar. University of Jena.
Construction Grammar Holger Diessel University of Jena holger.diessel@uni-jena.de http://www.holger-diessel.de/ Words seem to have a prototype structure; but language does not only consist of words. What
More informationSome Principles of Automated Natural Language Information Extraction
Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract
More informationCase government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG
Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG Dr. Kakia Chatsiou, University of Essex achats at essex.ac.uk Explorations in Syntactic Government and Subcategorisation,
More informationDetecting English-French Cognates Using Orthographic Edit Distance
Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National
More informationENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist
Meeting 2 Chapter 7 (Morphology) and chapter 9 (Syntax) Today s agenda Repetition of meeting 1 Mini-lecture on morphology Seminar on chapter 7, worksheet Mini-lecture on syntax Seminar on chapter 9, worksheet
More informationPrediction of Maximal Projection for Semantic Role Labeling
Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba
More informationPerformance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database
Journal of Computer and Communications, 2016, 4, 79-89 Published Online August 2016 in SciRes. http://www.scirp.org/journal/jcc http://dx.doi.org/10.4236/jcc.2016.410009 Performance Analysis of Optimized
More informationBasic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1
Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up
More informationIntroduction to Text Mining
Prelude Overview Introduction to Text Mining Tutorial at EDBT 06 René Witte Faculty of Informatics Institute for Program Structures and Data Organization (IPD) Universität Karlsruhe, Germany http://rene-witte.net
More informationInleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3
Inleiding Taalkunde Docent: Paola Monachesi Blok 4, 2001/2002 Contents 1 Syntax 2 2 Phrases and constituent structure 2 3 A minigrammar of Italian 3 4 Trees 3 5 Developing an Italian lexicon 4 6 S(emantic)-selection
More informationUSER ADAPTATION IN E-LEARNING ENVIRONMENTS
USER ADAPTATION IN E-LEARNING ENVIRONMENTS Paraskevi Tzouveli Image, Video and Multimedia Systems Laboratory School of Electrical and Computer Engineering National Technical University of Athens tpar@image.
More informationSCHEMA ACTIVATION IN MEMORY FOR PROSE 1. Michael A. R. Townsend State University of New York at Albany
Journal of Reading Behavior 1980, Vol. II, No. 1 SCHEMA ACTIVATION IN MEMORY FOR PROSE 1 Michael A. R. Townsend State University of New York at Albany Abstract. Forty-eight college students listened to
More informationHLTCOE at TREC 2013: Temporal Summarization
HLTCOE at TREC 2013: Temporal Summarization Tan Xu University of Maryland College Park Paul McNamee Johns Hopkins University HLTCOE Douglas W. Oard University of Maryland College Park Abstract Our team
More informationControlled vocabulary
Indexing languages 6.2.2. Controlled vocabulary Overview Anyone who has struggled to find the exact search term to retrieve information about a certain subject can benefit from controlled vocabulary. Controlled
More informationThe Role of the Head in the Interpretation of English Deverbal Compounds
The Role of the Head in the Interpretation of English Deverbal Compounds Gianina Iordăchioaia i, Lonneke van der Plas ii, Glorianna Jagfeld i (Universität Stuttgart i, University of Malta ii ) Wen wurmt
More informationDeveloping a TT-MCTAG for German with an RCG-based Parser
Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,
More informationPossessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand
1 Introduction Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand heidi.quinn@canterbury.ac.nz NWAV 33, Ann Arbor 1 October 24 This paper looks at
More informationEdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationCONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS
CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS Pirjo Moen Department of Computer Science P.O. Box 68 FI-00014 University of Helsinki pirjo.moen@cs.helsinki.fi http://www.cs.helsinki.fi/pirjo.moen
More informationArgument structure and theta roles
Argument structure and theta roles Introduction to Syntax, EGG Summer School 2017 András Bárány ab155@soas.ac.uk 26 July 2017 Overview Where we left off Arguments and theta roles Some consequences of theta
More informationI N T E R P R E T H O G A N D E V E L O P HOGAN BUSINESS REASONING INVENTORY. Report for: Martina Mustermann ID: HC Date: May 02, 2017
S E L E C T D E V E L O P L E A D H O G A N D E V E L O P I N T E R P R E T HOGAN BUSINESS REASONING INVENTORY Report for: Martina Mustermann ID: HC906276 Date: May 02, 2017 2 0 0 9 H O G A N A S S E S
More informationZero Pronominal Anaphora Resolution for the Romanian Language
Zero Pronominal Anaphora Resolution for the Romanian Language Claudiu Mihăilă 1,, Iustina Ilisei 2, and Diana Inkpen 3 1 Faculty of Computer Science, Al.I. Cuza University of Iaşi, 16 General Berthelot
More informationTHE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING
SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,
More informationCommon Core State Standards for English Language Arts
Reading Standards for Literature 6-12 Grade 9-10 Students: 1. Cite strong and thorough textual evidence to support analysis of what the text says explicitly as well as inferences drawn from the text. 2.
More informationMultilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities
Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities Soto Montalvo GAVAB Group URJC Raquel Martínez NLP&IR Group UNED Arantza Casillas Dpt. EE UPV-EHU Víctor Fresno GAVAB
More informationCopyright 2017 DataWORKS Educational Research. All rights reserved.
Copyright 2017 DataWORKS Educational Research. All rights reserved. No part of this work may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic or mechanical,
More informationMachine Learning from Garden Path Sentences: The Application of Computational Linguistics
Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,
More informationGeorgetown University at TREC 2017 Dynamic Domain Track
Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain
More informationCourse Outline for Honors Spanish II Mrs. Sharon Koller
Course Outline for Honors Spanish II Mrs. Sharon Koller Overview: Spanish 2 is designed to prepare students to function at beginning levels of proficiency in a variety of authentic situations. Emphasis
More informationA Comparison of Two Text Representations for Sentiment Analysis
010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational
More informationEmmaus Lutheran School English Language Arts Curriculum
Emmaus Lutheran School English Language Arts Curriculum Rationale based on Scripture God is the Creator of all things, including English Language Arts. Our school is committed to providing students with
More information