Towards Building a WordNet for Vietnamese

Size: px
Start display at page:

Download "Towards Building a WordNet for Vietnamese"

Transcription

1 Towards Building a WordNet for Vietnamese Ho Ngoc Duc Information Technology Institute, Vietnam National University 144 Xuan Thuy, Ha Noi ducna@vnu.edu.vn Nguyen Thi Thao Communication Network Center, Hanoi University of Technology 1. Dai Co Viet Road, Ha Noi Abstract: We report on our ongoing effort towards developing VietWordNet, a WordNet for the Vietnamese language. We present the methodology we used, the lexical resources we employed, and the computing tools we designed to help acquiring and filtering lexical and semantic information from available machine-readable dictionaries and other resources. Key Words: WordNet, Ontology, Language Engineering 1 Introduction WordNet ([4, 8], is a broad-coverage lexical-semantic net for the English language, developed at Princeton University since about Its design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. The synonym sets (or synsets) are linked via different relations, e.g., antonymy (e.g., rise and fall ), hyponymy (is-a, e.g., bird is a hyponym of animal ), and meronymy (has-a, partwhole-relation, e.g., body and hand ). WordNet has proved very useful for many activities in Natural Language Processing, e.g., parsing and machine translation, or the treatment of syntactic and semantic ambiguity. Moreover, attempts have been made to exploit the Princeton WordNet Database in Information Retrieval, applying in particular the synonymy relation represented via the so-called synsets. Among others, one has tried to enrich queries with semantically related terms and to compare queries and documents via conceptual distance measures. The success of WordNet has inspired several projects that aim at constructing WordNets for other languages than English [5, 7] or to develop multilingual WordNets. Perhaps the most important project in this line is EuroWordNet [11], a project aiming at building a WordNet for several European languages (Dutch, Italian, Spanish, etc.) As more and more Vietnamese texts become available in electronic form and new Web pages in Vietnamese language emerge everyday, there is a pressing need to create largescale online lexical-semantic nets for the Vietnamese language for use in NLP, Information Retrieval, and other areas. Our goal is to develop VietWordNet, a WordNet for the Vietnamese language along the line of Princeton WordNet. As the manual construction of VietWordNet from scratch would be very costly and time-consuming, we have focused on the techniques to semi-automatically construct the synsets and the relations between them from existing structured lexical resources. In this paper we present the methodology we used, the lexical resources we employed, and the computing tools we designed to help acquiring and filtering lexical knowledge and semantic information from available dictionaries and other resources. The rest of the paper is organized as follows. In the next section we describe our approach to build the core of VietWordNet its nominal part consisting of an inheritance system of Vietnamese nouns. Then we give an overview of the available resources, the tools we can employ and the programs we have to develop to process these resources. We discuss the necessary steps to construct the nominal part of VietWordNet and some methods to carry out these steps automatically. Finally we discuss our results and some ideas for future work. 2 Methodology WordNet treats four major syntactic categories (Noun, Verb, Adjective, Adverb) separately. We choose the nouns as our starting point to build VietWordNet. The choice is justified by the crucial role that the nouns play in the mental lexicon. Nouns are the most common words, they are hierarchically organized as an inheritance system, and many relationships between synsets only apply to nouns [9]. In the first phase, we are creating a hierarchy of Vietnamese nouns. The necessary steps are as follows. We start with a list of Vietnamese nouns. The meanings of these words are then organized into synsets, i.e., we need to identify the meanings of every single word and then group the word meanings into sets of synonyms. The next step is to create relationships between the synsets. In the first phase we only consider the hypernymy/hyponymy relation. Hence, the result is a taxonomy of concepts. Finally, we attach to each synset a gloss explaining the meanings of the words in the set.

2 To identify the different meanings of any Vietnamese noun we primarily use a Vietnamese-English dictionary, sometimes consulting monolingual dictionaries. The steps of creating the synsets and the links among them rely heavily on the use of English WordNet. To create the nominal hierarchy for VietWordNet, we try to attach Vietnamese nouns to English WordNet synsets, utilizing the bilingual dictionaries Vietnamese-English and English-Vietnamese. Finally, the glosses defining the synsets can be constructed using the monolingual Vietnamese dictionaries. Our approach is based on the following hypothesis. Although Vietnamese and English are very different languages, the inheritance systems of nominal concepts in the two languages are similar, at least in certain domains. More precisely, we expect the hierarchies of nouns denoting concrete objects are similar to those in WordNet. However, when abstract concepts are concerned, we expect gaps in the Vietnamese language, because it lacks many words expressing abstract concepts. This lack also results in other differences between the two languages. For example, the hypernym trees in VietWordNet can be expected to be shallower than corresponding ones in English WordNet, and in average the synsets of VietWordNet also contain fewer words. To fill the gaps caused by the lack of Vietnamese words for certain concepts, we should use collocations (multi-word translations, in the case of nouns: nominal phrases) when it seems necessary. Our approach is related to works aiming at constructing WordNets for other languages semi-automatically. Based on the conceptual similarity between English and other European languages, the skeletons of Spanish and Catalan WordNets were constructed in the same way [1, 3]. A similar effort to build a Korean WordNet is reported in [7]. The main differences between those works and ours lie in the ways the various heuristics are applied to map Vietnamese word senses to WordNet synsets. The characteristics of the Vietnamese language and the lack of reliable lexical resources in electronic form cause problems that must be solved innovatively. 3 Resources We use two different kinds of resources in our process of building the core of VietWordNet for nouns: lexical resources (existing WordNets and machine-readable dictionaries - MRD), and programs for processing these lexical resources. 3.1 Lexical resources A variety of broad-coverage linguistic resources (dictionaries, thesauri, text corpora, etc.) are available publicly for many European languages (German, Spanish, French...) They have proved to be very useful for the construction of WordNets for those languages. In Vietnamese, however, not so many large-scale linguistic resources are available, and very few resources are publicly available in computer-readable form. The MRDs that are available for Vietnamese were created with the human user in mind, i.e., they focus on presentation rather than structure, so they are not easily processed by computers. A human user can easily recognize the structure of an entry in a dictionary (e.g., headword, definitions, examples...) based on typographical properties (text size, color, style: bold, italic, etc.), but for a computer, a parser must be build to make the structure explicit. This task is often very difficult because the dictionaries are in different formats, and even within one dictionary, the formats of the different entries are not uniform. For instance, many entries contain the part-ofspeech (POS) information, but others do not, so it is not always possible to find out the POS of a word in a dictionary. As mentioned previously, the most important lexical resources we use are the English WordNet and the bilingual dictionaries English-Vietnamese and Vietnamese-English. Moreover, we also use some monolingual (Vietnamese) dictionaries. Version of Princeton WordNet contains mappings between nouns and synsets. Since WordNet was meant to be used on a computer from the beginning, it is truly computer-processable. The bilingual English/Vietnamese dictionaries that we use for building VietWordNet were developed in the Free Online Vietnamese Dictionary Project, a project initiated in 1997 by one of the authors and some other Vietnamese researchers. Its principal goals are to aid Vietnamese Internet users to access foreign-language resources and to support teaching and leaning the Vietnamese language over the Internet. In that project, several monolingual and bilingual dictionaries (English-Vietnamese, French-Vietnamese, etc.) are compiled, digitalized and integrated into a single extensible system. The English-Vietnamese dictionary contains about entries, of which circa are nouns. The Vietnamese-English dictionary contains around entries, including about nouns. Besides those resources we have at our disposal a larger Vietnamese-English dictionary, and 4 monolingual Vietnamese dictionaries. Those dictionaries are not very well structured, they are not so easily parsed, thus their use is currently quite limited, and we decided to use them at a later stage. The Vietnamese-English bilingual dictionary contains about entries. Among these entries, about are positively identified as nouns. The monolingual dictionaries contain about entries. Also intended for use in a later phase to enrich the synsets of VietWordNet are the bilingual French/Vietnamese dictionaries from the Free Online Vietnamese Dictionary Project. These dictionaries are relatively well structured. They contain around entries for each direction.

3 3.2 Computing tools For accessing WordNet we could have used the program code available in the WordNet package, either in C or in Prolog. However, we decided to develop all tools to construct and access VietWordNet using the Java language. The principal reasons to choose Java are the following. First, Java is platform-independent, so the programs we create are immediately available on all platforms without the need for porting. Second, with Java it is easy to turn the programs into a Web-based application that can be run on all operating systems, so that VietWordNet can be made available quickly on the Web for evaluation and validation by a large group of users. Third, Java offers built-in support for Unicode, which is essential for processing Vietnamese. Moreover, the task of constructing a WordNet browser can be done easily in Java using the Swing library. Thus, we decided to adopt JWNL (Java WordNet Library, [6]), a free third-party Java API for accessing WordNet, so a smooth integration of programs is guaranteed. The heterogeneity of Vietnamese resources makes it necessary to develop a set of computing tools to process them. We have implemented programs to parse entries of the bilingual English/Vietnamese and French/Vietnamese dictionaries from the Free Online Vietnamese Dictionary Project. They transform each entry in a dictionary to a structure consisting of the headword, the pronunciation, the POS, the translations, usage codes, examples etc. Preliminary versions of tools for mapping nouns from the dictionaries to English WordNet synsets have also been implemented. Parsers for the larger Vietnamese-English dictionary and for the monolingual Vietnamese dictionaries are currently under development. Moreover, a tool has been created to identify words in a Vietnamese text. We shall explain the use of some tools in the next section. 4 Methods for constructing VietWordNet Our approach is semi-automatic, i.e., some tasks are performed automatically, and other tasks must be done manually. The automatization relies on several heuristics, developed in [1, 2, 3, 5, 10] for construction WordNets for other languages, utilizing the available bilingual and monolingual dictionaries as well as large corpora. The first step of our procedure is to choose a list of Vietnamese nouns that will constitute the skeleton of VietWordNet and to identify all meanings of those nouns. This step is fully automated. We go through all entries of the Vietnamese-English dictionary and create for each entry the set of meanings of the headword if it is a noun. Because not all entries in the Vietnamese-English dictionary contain the part-of-speech, we rely on some heuristics to check if a word is indeed a noun. If the POS is not known for an entry, we check if its headword is contained in the list of Vietnamese nouns. For that task we need a list that comprises almost all Vietnamese nouns. We have extracted such a list from the available monolingual dictionaries. Another method to determine the POS of a Vietnamese word is to see if its English translations are nouns by comparing them with a list of English nouns that covers almost all English nouns. (We use the list of nouns that are contained in WordNet to meet this requirement.) If all single-word translations of a sense of a Vietnamese word are nouns then this word is a noun. The next step to build VietWordNet is to attach Vietnamese nouns to synsets of the English WordNet. Once the Vietnamese nouns are connected to WordNet synsets, we have achieved two goals. First, we have grouped meanings of Vietnamese words into synsets. Second, the most important semantic relations are transferred to VietWordNet, including the hypernymy/hyponymy relation. This approach can be illustrated with an example. Figure 1a depicts a tiny fraction of the inheritance system of English nouns. This system can help us to establish the hypernymy relation between cow and animal, or to find out the socalled conceptual distance between cow and buffalo. (Princeton WordNet contains much more information, i.e., it also provides the synonyms of the nouns under consideration, but we shall ignore those information for now.) If we can attach to the synsets of figure 1a the corresponding Vietnamese words, e.g., the words động vật, bò and trâu to the nodes representing the synsets of animal, cow and buffalo, as depicted in figure 1b, then we have also established the corresponding relationships between the concepts động vật and bò or between bò and trâu. 4a1 Buffalo Land animals Animals Herbivorous Carnivorous Omnivorous Tr u Cow 1a 2a 2b 3a 3b 3c 4a2 éng vët trªn Êt liòn Figure 1a éng vët éng vët n cá éng vët n thþt éng vët n t¹p Bß 1b 2a1 2b1 3a1 3b1 3c1 4a11 4a21 Figure 1b Sea animals éng vët trªn bión We are experimenting with two different approaches to attach Vietnamese nouns to WordNet synsets. In the first one we use the Vietnamese-English dictionary to translate all

4 Vietnamese nouns. Each Vietnamese word will have one or more senses, and the English translations of each sense will belong to one or more synsets. Our task is to select those synsets to which we can attach the senses of the Vietnamese noun. The second approach is to start with the synsets in the inheritance system of the English WordNet. For each synset we use the English-Vietnamese dictionary to find Vietnamese translations of the English words in that synset. Then selected Vietnamese translations will be attached to that synset. In both approaches, if the translation is one-to-one (i.e., the bilingual dictionary in question gives only one translation for a certain sense) then we can assume with high confidence that the correspondence between word and synset has been established. This correspondence can be double-checked by translating back the words using the other bilingual dictionary. If one word has many translations then we need to apply certain heuristics to exclude false hits and to assign the translations to the correct synsets. Let us consider a simple example to illustrate the first approach. Consider the Vietnamese word đông. This word can be a noun (east, orient; winter), a verb (to coagulate; to congeal; to freeze), or an adjective (crowed; numerous). In the function as a noun it has two senses. According to our Vietnamese-English dictionary, the first sense has two English translations: East and orient, and the second one has one translation: winter. (The different senses and translations can be extracted from the dictionary using the parsers we have developed.) Let us look at the first case. Take the two English nouns East and orient as input. According to WordNet, East has 3 and orient has 2 senses. Thus, the first sense of the word đông may be attached to up to 5 synsets. We rank these candidates according to our confidence score based on several heuristics. We shall discuss those heuristics later. The second case is easier. WordNet tells us that the English noun winter has only one sense and belongs to the synset winter, wintertime. Thus, we can attach the word đông to that synset. Previous works have identified various heuristics to aid the automatic construction of WordNets for other languages. In [1, 2], the authors describe a method to extract semantic relationships from a monolingual dictionary and to use these information to construct a hierarchy of concepts. Unfortunately, the structures of the available monolingual dictionaries for Vietnamese are not very well suited for that method. To apply that method we need a tool to make the structure of a definition in the monolingual dictionary explicit, i.e., we need to analyze the word definition in order to identify the genus and the characteristics of the word to be defined. As a preliminary step towards using that heuristics we have developed a tool to identify Vietnamese words in a sentence. Some other methods can be applied more easily. A simple one is to see if all (or most) English translations of a Vietnamese noun constitute a WordNet synset. If so, the word can be attached to that synset. For example, a WordNet synset consists of two words: East and orient. The first sense of the Vietnamese noun đông has also two translations, East and orient. Thus, we may assign this word to the synset. Another heuristics are is based on the assumption that the senses in the dictionaries are ordered according to their frequency, so that the first sense in the monolingual dictionary corresponds to the synset constructed using the first sense in the Vietnamese-English dictionary, so we attach the first definition to that synset. Another method is to use the available usage codes. For example, some entries contain information about the semantic domain (Science, History, Economics...) or the usage style (technical, slang, vulgar...) of the words in question. Those usage hints can also help to test the compatibility between a Vietnamese word and a WordNet synset. A last step to complete the core nominal VietWordNet is to add glosses to the synsets. Currently we simply add a definition from a monolingual dictionary to the synset. This method is most reliable if the synset contains a single word and that word has only one sense. If the words in the synset have several senses we have to rely on various heuristics. As none of the discussed heuristical methods is fully reliable we have to combine the results of several methods to achieve a high score. 5 Conclusions and future work In this paper we have explored the semi-automatic construction of a WordNet for the Vietnamese language using pre-existing lexical resources such as Princeton WordNet, Vietnamese/English bilingual dictionaries, and monolingual Vietnamese dictionaries. We have analyzed the available resources, evaluated the applicability of several heuristics to them, and designed a set of tools to make these resources suitable for building a core VietWordNet. In the future we need to improve the tools to better utilize the available resources so that we can test and compare the various heuristics more adequately. In particular, we need to implement a parser for the larger Vietnamese-English dictionary so that we can construct a larger VietWordNet. We also need tools to process the monolingual dictionaries. Another issue is to make the programs more robust, so that they can cope with typographical errors in the resources. The construction of VietWordNet cannot be fully automated. A step of testing and validation by human experts is necessary. To aid human users in this work, we are going to design and implement a set of visual tools that let users see and modify portions of VietWordNet, so that they can add words to or delete words from synsets, create or delete synsets, or modify relations between synsets. We intend to integrate such tools into a Web-based application, so that online collaboration between various groups can be achieved more easily.

5 Acknowledgements We thank Mr. Ho Hai Thuy for providing us with many essential lexical resources. We have also benefited very much from discussions with him about Vietnamese lexicography. We thank all the friends in the Free Online Vietnamese Dictionary Project who have contributed to the construction of electronic English-Vietnamese and Vietnamese-English dictionaries. References [10] [11] Rigau, G., H. Rodriguez and E. Agirre, "Building Accurate Semantic Taxonomies from Monolingual MRDs". In: Proceedings of COLING- ACL '98. Montréal, 1998 Vossen, P., "EuroWordNet: building a multilingual database with wordnets for European languages". In: The ELRA Newsletter, Vol. 3(1), [1] [2] [3] [4] [5] [6] [7] [8] [9] Atserias, J. et. al., Combining multiple methods for the automatic construction of multilingual WordNets.'' In: Proceedings of the International Conference on Recent Advances in Natural Language Processing, Tzigov Chark, 1997 Copestake, A., "An approach to building the hierarchical element of a lexical knowledge base from a machine readable dictionary". In: Proceedings of the First International Workshop on Inheritance in Natural Language Processing, Farreres, X., G. Rigau and H. Rodriguez, ``Using WordNet for building WordNets.'' In: Proceedings of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems, Montreal, 1998 Fellbaum, C. (Ed.) WordNet: An Electronic Lexical Database. The MIT Press, Cambridge, 1998 Hamp, B. and H. Feldweg, GermaNet a Lexical- Semantic Net for German. In: Proceedings of ACL Workshop Automatic Information Extraction and Building of Lexical Semantic Resources, Madrid, 1997 JWNL: Lee, C., G. Lee and J. Seo, ``Automatic WordNet mapping using Word Sense Disambiguation.'' In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC 2000), Hong Kong, 2000 Miller, G. et. al., Five papers on WordNet. In: International Journal of Lexicography 3 (4), 1990 Miller, G., Nouns in WordNet: a Lexical Inheritance System. In: International Journal of Lexicography 3 (4), 1990

The MEANING Multilingual Central Repository

The MEANING Multilingual Central Repository The MEANING Multilingual Central Repository J. Atserias, L. Villarejo, G. Rigau, E. Agirre, J. Carroll, B. Magnini, P. Vossen January 27, 2004 http://www.lsi.upc.es/ nlp/meaning Jordi Atserias TALP Index

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Word Sense Disambiguation

Word Sense Disambiguation Word Sense Disambiguation D. De Cao R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 May 21, 2009 Excerpt of the R. Mihalcea and T. Pedersen AAAI 2005 Tutorial, at: http://www.d.umn.edu/ tpederse/tutorials/advances-in-wsd-aaai-2005.ppt

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

Vocabulary Usage and Intelligibility in Learner Language

Vocabulary Usage and Intelligibility in Learner Language Vocabulary Usage and Intelligibility in Learner Language Emi Izumi, 1 Kiyotaka Uchimoto 1 and Hitoshi Isahara 1 1. Introduction In verbal communication, the primary purpose of which is to convey and understand

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

2.1 The Theory of Semantic Fields

2.1 The Theory of Semantic Fields 2 Semantic Domains In this chapter we define the concept of Semantic Domain, recently introduced in Computational Linguistics [56] and successfully exploited in NLP [29]. This notion is inspired by the

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

A Bayesian Learning Approach to Concept-Based Document Classification

A Bayesian Learning Approach to Concept-Based Document Classification Databases and Information Systems Group (AG5) Max-Planck-Institute for Computer Science Saarbrücken, Germany A Bayesian Learning Approach to Concept-Based Document Classification by Georgiana Ifrim Supervisors

More information

A Domain Ontology Development Environment Using a MRD and Text Corpus

A Domain Ontology Development Environment Using a MRD and Text Corpus A Domain Ontology Development Environment Using a MRD and Text Corpus Naomi Nakaya 1 and Masaki Kurematsu 2 and Takahira Yamaguchi 1 1 Faculty of Information, Shizuoka University 3-5-1 Johoku Hamamatsu

More information

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Ulrike Baldewein (ulrike@coli.uni-sb.de) Computational Psycholinguistics, Saarland University D-66041 Saarbrücken,

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information

Combining a Chinese Thesaurus with a Chinese Dictionary

Combining a Chinese Thesaurus with a Chinese Dictionary Combining a Chinese Thesaurus with a Chinese Dictionary Ji Donghong Kent Ridge Digital Labs 21 Heng Mui Keng Terrace Singapore, 119613 dhji @krdl.org.sg Gong Junping Department of Computer Science Ohio

More information

Multilingual Sentiment and Subjectivity Analysis

Multilingual Sentiment and Subjectivity Analysis Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department

More information

Leveraging Sentiment to Compute Word Similarity

Leveraging Sentiment to Compute Word Similarity Leveraging Sentiment to Compute Word Similarity Balamurali A.R., Subhabrata Mukherjee, Akshat Malu and Pushpak Bhattacharyya Dept. of Computer Science and Engineering, IIT Bombay 6th International Global

More information

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE Submitted in partial fulfillment of the requirements for the degree of Sarjana Sastra (S.S.)

More information

THE VERB ARGUMENT BROWSER

THE VERB ARGUMENT BROWSER THE VERB ARGUMENT BROWSER Bálint Sass sass.balint@itk.ppke.hu Péter Pázmány Catholic University, Budapest, Hungary 11 th International Conference on Text, Speech and Dialog 8-12 September 2008, Brno PREVIEW

More information

On document relevance and lexical cohesion between query terms

On document relevance and lexical cohesion between query terms Information Processing and Management 42 (2006) 1230 1247 www.elsevier.com/locate/infoproman On document relevance and lexical cohesion between query terms Olga Vechtomova a, *, Murat Karamuftuoglu b,

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

Some Principles of Automated Natural Language Information Extraction

Some Principles of Automated Natural Language Information Extraction Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract

More information

1. Introduction. 2. The OMBI database editor

1. Introduction. 2. The OMBI database editor OMBI bilingual lexical resources: Arabic-Dutch / Dutch-Arabic Carole Tiberius, Anna Aalstein, Instituut voor Nederlandse Lexicologie Jan Hoogland, Nederlands Instituut in Marokko (NIMAR) In this paper

More information

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Project in the framework of the AIM-WEST project Annotation of MWEs for translation Project in the framework of the AIM-WEST project Annotation of MWEs for translation 1 Agnès Tutin LIDILEM/LIG Université Grenoble Alpes 30 october 2014 Outline 2 Why annotate MWEs in corpora? A first experiment

More information

Controlled vocabulary

Controlled vocabulary Indexing languages 6.2.2. Controlled vocabulary Overview Anyone who has struggled to find the exact search term to retrieve information about a certain subject can benefit from controlled vocabulary. Controlled

More information

A heuristic framework for pivot-based bilingual dictionary induction

A heuristic framework for pivot-based bilingual dictionary induction 2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,

More information

Analysis of Lexical Structures from Field Linguistics and Language Engineering

Analysis of Lexical Structures from Field Linguistics and Language Engineering Analysis of Lexical Structures from Field Linguistics and Language Engineering P. Wittenburg, W. Peters +, S. Drude ++ Max-Planck-Institute for Psycholinguistics Wundtlaan 1, 6525 XD Nijmegen, The Netherlands

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach

The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach BILINGUAL LEARNERS DICTIONARIES The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach Mark VAN MOL, Leuven, Belgium Abstract This paper reports on the

More information

A Semantic Similarity Measure Based on Lexico-Syntactic Patterns

A Semantic Similarity Measure Based on Lexico-Syntactic Patterns A Semantic Similarity Measure Based on Lexico-Syntactic Patterns Alexander Panchenko, Olga Morozova and Hubert Naets Center for Natural Language Processing (CENTAL) Université catholique de Louvain Belgium

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

A Bottom-up Comparative Study of EuroWordNet and WordNet 3.0 Lexical and Semantic Relations

A Bottom-up Comparative Study of EuroWordNet and WordNet 3.0 Lexical and Semantic Relations A Bottom-up Comparative Study of EuroWordNet and WordNet 3.0 Lexical and Semantic Relations Maria Teresa Pazienza a, Armando Stellato a, Alexandra Tudorache ab a) AI Research Group, Dept. of Computer Science,

More information

Ontologies vs. classification systems

Ontologies vs. classification systems Ontologies vs. classification systems Bodil Nistrup Madsen Copenhagen Business School Copenhagen, Denmark bnm.isv@cbs.dk Hanne Erdman Thomsen Copenhagen Business School Copenhagen, Denmark het.isv@cbs.dk

More information

Lexical Similarity based on Quantity of Information Exchanged - Synonym Extraction

Lexical Similarity based on Quantity of Information Exchanged - Synonym Extraction Intl. Conf. RIVF 04 February 2-5, Hanoi, Vietnam Lexical Similarity based on Quantity of Information Exchanged - Synonym Extraction Ngoc-Diep Ho, Fairon Cédrick Abstract There are a lot of approaches for

More information

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

Modeling full form lexica for Arabic

Modeling full form lexica for Arabic Modeling full form lexica for Arabic Susanne Alt Amine Akrout Atilf-CNRS Laurent Romary Loria-CNRS Objectives Presentation of the current standardization activity in the domain of lexical data modeling

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition Chapter 2: The Representation of Knowledge Expert Systems: Principles and Programming, Fourth Edition Objectives Introduce the study of logic Learn the difference between formal logic and informal logic

More information

Ontological spine, localization and multilingual access

Ontological spine, localization and multilingual access Start Ontological spine, localization and multilingual access Some reflections and a proposal New Perspectives on Subject Indexing and Classification in an International Context International Symposium

More information

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt Outline Using Web Data for Linguistic Purposes NCLT, Dublin City University Outline Outline 1 Corpora as linguistic tools 2 Limitations of web data Strategies to enhance web data 3 Corpora as linguistic

More information

Collocations of Nouns: How to Present Verb-noun Collocations in a Monolingual Dictionary

Collocations of Nouns: How to Present Verb-noun Collocations in a Monolingual Dictionary Sanni Nimb, The Danish Dictionary, University of Copenhagen Collocations of Nouns: How to Present Verb-noun Collocations in a Monolingual Dictionary Abstract The paper discusses how to present in a monolingual

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

Developing a TT-MCTAG for German with an RCG-based Parser

Developing a TT-MCTAG for German with an RCG-based Parser Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

Semantic Evidence for Automatic Identification of Cognates

Semantic Evidence for Automatic Identification of Cognates Semantic Evidence for Automatic Identification of Cognates Andrea Mulloni CLG, University of Wolverhampton Stafford Street Wolverhampton WV SB, United Kingdom andrea@wlv.ac.uk Viktor Pekar CLG, University

More information

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight. Final Exam (120 points) Click on the yellow balloons below to see the answers I. Short Answer (32pts) 1. (6) The sentence The kinder teachers made sure that the students comprehended the testable material

More information

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer

More information

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form Orthographic Form 1 Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form The development and testing of word-retrieval treatments for aphasia has generally focused

More information

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features Sriram Venkatapathy Language Technologies Research Centre, International Institute of Information Technology

More information

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS Julia Tmshkina Centre for Text Techitology, North-West University, 253 Potchefstroom, South Africa 2025770@puk.ac.za

More information

Accuracy (%) # features

Accuracy (%) # features Question Terminology and Representation for Question Type Classication Noriko Tomuro DePaul University School of Computer Science, Telecommunications and Information Systems 243 S. Wabash Ave. Chicago,

More information

Learning and Retaining New Vocabularies: The Case of Monolingual and Bilingual Dictionaries

Learning and Retaining New Vocabularies: The Case of Monolingual and Bilingual Dictionaries Learning and Retaining New Vocabularies: The Case of Monolingual and Bilingual Dictionaries Mohsen Mobaraki Assistant Professor, University of Birjand, Iran mmobaraki@birjand.ac.ir *Amin Saed Lecturer,

More information

Part III: Semantics. Notes on Natural Language Processing. Chia-Ping Chen

Part III: Semantics. Notes on Natural Language Processing. Chia-Ping Chen Part III: Semantics Notes on Natural Language Processing Chia-Ping Chen Department of Computer Science and Engineering National Sun Yat-Sen University Kaohsiung, Taiwan ROC Part III: Semantics p. 1 Introduction

More information

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities Soto Montalvo GAVAB Group URJC Raquel Martínez NLP&IR Group UNED Arantza Casillas Dpt. EE UPV-EHU Víctor Fresno GAVAB

More information

Automatic Extraction of Semantic Relations by Using Web Statistical Information

Automatic Extraction of Semantic Relations by Using Web Statistical Information Automatic Extraction of Semantic Relations by Using Web Statistical Information Valeria Borzì, Simone Faro,, Arianna Pavone Dipartimento di Matematica e Informatica, Università di Catania Viale Andrea

More information

An Interactive Intelligent Language Tutor Over The Internet

An Interactive Intelligent Language Tutor Over The Internet An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This

More information

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and

More information

CS 598 Natural Language Processing

CS 598 Natural Language Processing CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@

More information

Detecting English-French Cognates Using Orthographic Edit Distance

Detecting English-French Cognates Using Orthographic Edit Distance Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National

More information

Proceedings of the 19th COLING, , 2002.

Proceedings of the 19th COLING, , 2002. Crosslinguistic Transfer in Automatic Verb Classication Vivian Tsang Computer Science University of Toronto vyctsang@cs.toronto.edu Suzanne Stevenson Computer Science University of Toronto suzanne@cs.toronto.edu

More information

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade Trend Survey on Japanese Natural Language Processing Studies over the Last Decade Masaki Murata, Koji Ichii, Qing Ma,, Tamotsu Shirado, Toshiyuki Kanamaru,, and Hitoshi Isahara National Institute of Information

More information

Extended Similarity Test for the Evaluation of Semantic Similarity Functions

Extended Similarity Test for the Evaluation of Semantic Similarity Functions Extended Similarity Test for the Evaluation of Semantic Similarity Functions Maciej Piasecki 1, Stanisław Szpakowicz 2,3, Bartosz Broda 1 1 Institute of Applied Informatics, Wrocław University of Technology,

More information

2 nd grade Task 5 Half and Half

2 nd grade Task 5 Half and Half 2 nd grade Task 5 Half and Half Student Task Core Idea Number Properties Core Idea 4 Geometry and Measurement Draw and represent halves of geometric shapes. Describe how to know when a shape will show

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

English Language and Applied Linguistics. Module Descriptions 2017/18

English Language and Applied Linguistics. Module Descriptions 2017/18 English Language and Applied Linguistics Module Descriptions 2017/18 Level I (i.e. 2 nd Yr.) Modules Please be aware that all modules are subject to availability. If you have any questions about the modules,

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Cross-Lingual Text Categorization

Cross-Lingual Text Categorization Cross-Lingual Text Categorization Nuria Bel 1, Cornelis H.A. Koster 2, and Marta Villegas 1 1 Grup d Investigació en Lingüística Computacional Universitat de Barcelona, 028 - Barcelona, Spain. {nuria,tona}@gilc.ub.es

More information

The Implementation of Interactive Multimedia Learning Materials in Teaching Listening Skills

The Implementation of Interactive Multimedia Learning Materials in Teaching Listening Skills English Language Teaching; Vol. 8, No. 12; 2015 ISSN 1916-4742 E-ISSN 1916-4750 Published by Canadian Center of Science and Education The Implementation of Interactive Multimedia Learning Materials in

More information

BULATS A2 WORDLIST 2

BULATS A2 WORDLIST 2 BULATS A2 WORDLIST 2 INTRODUCTION TO THE BULATS A2 WORDLIST 2 The BULATS A2 WORDLIST 21 is a list of approximately 750 words to help candidates aiming at an A2 pass in the Cambridge BULATS exam. It is

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability Shih-Bin Chen Dept. of Information and Computer Engineering, Chung-Yuan Christian University Chung-Li, Taiwan

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 - C.E.F.R. Oral Assessment Criteria Think A F R I C A - 1 - 1. The extracts in the left hand column are taken from the official descriptors of the CEFR levels. How would you grade them on a scale of low,

More information

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

International Examinations. IGCSE English as a Second Language Teacher s book. Second edition Peter Lucantoni and Lydia Kellas

International Examinations. IGCSE English as a Second Language Teacher s book. Second edition Peter Lucantoni and Lydia Kellas International Examinations IGCSE English as a Second Language Teacher s book Second edition Peter Lucantoni and Lydia Kellas To Costas Djapouras, without whose help and support this book would never have

More information

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &, ! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &, 4 The Interaction of Knowledge Sources in Word Sense Disambiguation Mark Stevenson Yorick Wilks University of Shef eld University of Shef eld Word sense

More information

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS Pirjo Moen Department of Computer Science P.O. Box 68 FI-00014 University of Helsinki pirjo.moen@cs.helsinki.fi http://www.cs.helsinki.fi/pirjo.moen

More information

Compositional Semantics

Compositional Semantics Compositional Semantics CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Words, bag of words Sequences Trees Meaning Representing Meaning An important goal of NLP/AI: convert natural language

More information

Natural Language Processing. George Konidaris

Natural Language Processing. George Konidaris Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2017 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans

More information

Double Master Degrees in International Economics and Development

Double Master Degrees in International Economics and Development Double Master Degrees in International Economics and Development I. Recruitment condition The admissions procedure is open to all students who meet the following conditions: - Condition of diploma: + Candidates

More information

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions. to as a linguistic theory to to a member of the family of linguistic frameworks that are called generative grammars a grammar which is formalized to a high degree and thus makes exact predictions about

More information

Constructing Parallel Corpus from Movie Subtitles

Constructing Parallel Corpus from Movie Subtitles Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing

More information

Busuu The Mobile App. Review by Musa Nushi & Homa Jenabzadeh, Introduction. 30 TESL Reporter 49 (2), pp

Busuu The Mobile App. Review by Musa Nushi & Homa Jenabzadeh, Introduction. 30 TESL Reporter 49 (2), pp 30 TESL Reporter 49 (2), pp. 30 38 Busuu The Mobile App Review by Musa Nushi & Homa Jenabzadeh, Shahid Beheshti University, Tehran, Iran Introduction Technological innovations are changing the second language

More information

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2 Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2 Ted Pedersen Department of Computer Science University of Minnesota Duluth, MN, 55812 USA tpederse@d.umn.edu

More information

An Introduction to the Minimalist Program

An Introduction to the Minimalist Program An Introduction to the Minimalist Program Luke Smith University of Arizona Summer 2016 Some findings of traditional syntax Human languages vary greatly, but digging deeper, they all have distinct commonalities:

More information

A Metacognitive Approach to Support Heuristic Solution of Mathematical Problems

A Metacognitive Approach to Support Heuristic Solution of Mathematical Problems A Metacognitive Approach to Support Heuristic Solution of Mathematical Problems John TIONG Yeun Siew Centre for Research in Pedagogy and Practice, National Institute of Education, Nanyang Technological

More information

An ICT environment to assess and support students mathematical problem-solving performance in non-routine puzzle-like word problems

An ICT environment to assess and support students mathematical problem-solving performance in non-routine puzzle-like word problems An ICT environment to assess and support students mathematical problem-solving performance in non-routine puzzle-like word problems Angeliki Kolovou* Marja van den Heuvel-Panhuizen*# Arthur Bakker* Iliada

More information

Effectiveness of Electronic Dictionary in College Students English Learning

Effectiveness of Electronic Dictionary in College Students English Learning 2016 International Conference on Mechanical, Control, Electric, Mechatronics, Information and Computer (MCEMIC 2016) ISBN: 978-1-60595-352-6 Effectiveness of Electronic Dictionary in College Students English

More information

A Comparative Evaluation of Word Sense Disambiguation Algorithms for German

A Comparative Evaluation of Word Sense Disambiguation Algorithms for German A Comparative Evaluation of Word Sense Disambiguation Algorithms for German Verena Henrich, Erhard Hinrichs University of Tübingen, Department of Linguistics Wilhelmstr. 19, 72074 Tübingen, Germany {verena.henrich,erhard.hinrichs}@uni-tuebingen.de

More information

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification? Can Human Verb Associations help identify Salient Features for Semantic Verb Classification? Sabine Schulte im Walde Institut für Maschinelle Sprachverarbeitung Universität Stuttgart Seminar für Sprachwissenschaft,

More information

English-German Medical Dictionary And Phrasebook By A.H. Zemback

English-German Medical Dictionary And Phrasebook By A.H. Zemback English-German Medical Dictionary And Phrasebook By A.H. Zemback If you are searching for a ebook English-German Medical Dictionary and Phrasebook by A.H. Zemback in pdf form, then you've come to loyal

More information

CEFR Overall Illustrative English Proficiency Scales

CEFR Overall Illustrative English Proficiency Scales CEFR Overall Illustrative English Proficiency s CEFR CEFR OVERALL ORAL PRODUCTION Has a good command of idiomatic expressions and colloquialisms with awareness of connotative levels of meaning. Can convey

More information