Indonesian-English Transitive Translation for Cross-Language Information Retrieval
|
|
- Ethelbert Taylor
- 5 years ago
- Views:
Transcription
1 Indonesian-English Transitive Translation for Cross-Language Information Retrieval Mirna Adriani, Herika Hayurani, and Syandra Sari Faculty of Computer Science University of Indonesia Depok 16424, Indonesia Abstract. This is a report on our evaluation of using some language resources for the Indonesian-English bilingual task of the 2007 Cross-Language Evaluation Forum (CLEF). We chose to translate an Indonesian query set into English using machine translation, transitive translation, and parallel corpus-based techniques. We also made an attempt to improve the retrieval effectiveness using a query expansion technique. The result shows that the best retrieval performance was achieved by combining the machine translation technique and the query expansion technique. 1 Introduction To participate in the bilingual 2007 Cross Language Evaluation Forum (CLEF) task, i.e., the Indonesian-English CLIR, we needed to use language resources to translate Indonesian queries into English. However, there were not many language resources that were available on the Internet for free. We sought out for some language resources that can be used for the translation process. We learned from our previous work [1, 2] that freely available dictionaries on the Internet could not correctly translate many Indonesian terms, as their vocabulary was very limited. This lead us to exploring other possible approaches such as using machine translation techniques [3], parallel corpus-based techniques, and also transitive translation techniques. Previous work has demonstrated that parallel corpus could be used as a way to find word pairs in different languages [4, 5, 6]. The word pairs could then be used to translate the queries from one language to be used to retrieve documents in another language. If such resource is not available, another possibility is by translating through some other language, known as pivot language, that has more language resources [3, 7, 8]. 2 The Query Translation Process As a first step, we manually translated the original CLEF query set from English into Indonesian. We then translated the resulting Indonesian queries back into English using machine translation technique, transitive queries technique, and the parallel corpus. For the machine translation technique, we translate the Indonesian queries into English using the available machine translation on the Internet. The transitive C. Peters et al. (Eds.): CLEF 2007, LNCS 5152, pp , Springer-Verlag Berlin Heidelberg 2008
2 128 M. Adriani, H. Hayurani, and S. Sari technique uses German and French as the pivot languages. So, Indonesian queries are translated into French and German using bilingual dictionaries, then the German and French queries are translated into English using other dictionaries. The third technique uses a parallel corpus to translate the Indonesian queries. We created a parallel corpus by translating all the English documents in the CLEF collection into Indonesian using a commercial machine translation software called Transtool 1.We then created the English queries by taking a certain number of terms from certain number of documents that appear in the top document list. 2.1 Query Expansion Technique Adding the translated queries with relevant terms (known as query expansion) has been shown to improve CLIR effectiveness [1, 3]. One of the query expansion techniques is called the pseudo relevance feedback [5]. This technique is based on an assumption that the top few documents initially retrieved are indeed relevant to the query, and so they must contain other terms that are also relevant to the query. The query expansion technique adds such terms into the previous query. We applied this technique in this work. To choose the relevant terms from the top ranked documents, we used the tf*idf term weighting formula [9]. We added a certain number of terms that have the highest weight scores. 3 Experiment We participated in the bilingual task with English topics. The English document collection contains 190,604 documents from two English newspapers, the Glasgow Herald and the Los Angeles Times. We opted to use the query title and the query description provided with the query topics. The query translation process was performed fully automatic using a machine translation technique, transitive technique, and the parallel corpus (Figure 1). The machine translation technique translates the Indonesian queries into English using Toggletext 2, a machine translation that is available on the Internet. The transitive technique translates the Indonesian queries into English through German and French as the pivot languages. The translation is done using a dictionary. All of the Indonesian words are translated into German or French if they are found on the bilingual dictionaries, otherwise they are left in the original language. In our experiments we took several approaches to handling transitive translation such as using English sense words found in either German or French dictionary (Union); and using only English sense words that appear in both German and French dictionaries (Intersection). For the parallel corpus-based technique, we used pseudo translation to get English words using Indonesian queries. First, an Indonesian query is used to retrieve the top N Indonesian documents through an IR system. Next, we identify English documents that are parallel (paired) to these top N Indonesian documents. From the top N English documents, we created the equivalent English query based on the top T terms that have highest tf-idf scores [9]. 1 See 2 See
3 Indonesian-English Transitive Translation for Cross-Language Information Retrieval Indonesian Query Machine Translation English Query 2. Indonesian Query Parallel Corpus English Query 3. Indonesian Query German English Query (using dictionary) 4. Indonesian Query French English Query (using dictionary) 5. English queries contains 3 & 4 6. Indonesian Query French English Query English Query German English Query Fig. 1. The translation techniques that are used in the experiments We then applied a pseudo relevance-feedback query-expansion technique to the queries that were translated using the three techniques above. In these experiments, we used Lemur 3 information retrieval system, which is based on a language model, to index and retrieve the documents. In these experiments we also use the synonym operators to handle the translation words that are found in the dictionaries. The synonym operator gives the same weights to all the words inside it. 4 Results Our work focused on the bilingual task using Indonesian queries to retrieve documents in the English collections. Our experiments contain official runs that have identification labels and non-official runs that do not have identification labels. Table 1-6 shows the result of our experiments. The retrieval performance of the title-based translation queries dropped 15.59% below that of the equivalent monolingual retrieval (see Table 1). The retrieval performance of using a combination of query title and description dropped 15.72% below that of the equivalent monolingual queries. Table 1. Mean Average Precision (MAP) of the monolingual runs of the title and combination of title and description topics and their translation queries using the machine translation Query Monolingual Machine Translation (MT) Title (depok.uiqttoggle) (-15.59%) Title + Description (depok.uiqtdtoggle) (-15.72%) 3 See
4 130 M. Adriani, H. Hayurani, and S. Sari The retrieval performance of the title-based translation queries dropped 1.64% below that of the equivalent monolingual retrieval (see Table 2) after applying the query expansion technique to the translated queries. It is increased the average precision retrieval performance by 13.95% compared to the machine translation only. However, applying query expansion to the combination of the query title and description achieves 4.38% below that of the equivalent monolingual queries. It increases the average retrieval precision of the machine translation technique by 11.34%. Table 2. Mean Average Precision (MAP) of the monolingual runs of the title and combination of title and description topics and their translation queries using the machine translation and query expansion techniques Query Monolingual MT + QE Title (depok.uiqttogglefb10d10t) (-1.64%) Title + Description (depok.uiqtdtogglefb10d10t) (-4.38%) Table 3. Mean Average Precision (MAP) of the monolingual runs of the title and combination of title and description topics and their translation queries using transitive translation (Indonesian queries are translated to English queries via German only and via French only) Query Monolingual Transitive Translation Title + Description (via French only- depok.uiqtdfrsyn) (-33.50%) Title + Description (via German only-depok.uiqtddesyn) Title + Description (via German and French) (-29.04%) (-33.18%) The result of using the transitive translation technique for the combination of the title and description queries is shown in Table 3. Translating the queries into English using French as the pivot language decreased the mean average precision by 33.50% compared to the monolingual queries. Translating the Indonesian queries into English using German as the pivot language decreased the mean average precision by 29.04% compared to the monolingual queries. Translating Indonesian queries into English queries using two pivot languages decreases the mean average precision by 33.18% compared to the monolingual queries. The transitive translation technique was applied for translating the Indonesian queries into English via German and French. All the English terms that were derived from the German and French words were taken based on the union and the intersection between the two sets. Adding Indonesian words that could not be translated into English resulted in a drop of the average precision by 34.56% compared to the
5 Indonesian-English Transitive Translation for Cross-Language Information Retrieval 131 Table 4. Mean Average Precision (MAP) of the monolingual runs of the title and combination of title and description topics and their translation queries using transitive translation (Indonesian queries are translated to English queries via German) Query Monolingual Transitive Translation Title + Description (-29.04%) Title + Description + QE (depok.uiqtddesynfb10d10t) Title + Description + QE(depok.uiqtddesynfb10d10t) Title + Description + QE(depok.uiqtddesynfb5d10t) (-17.60%) (-14.69%) (-15.38%) Table 5. Mean Average Precision (MAP) of the monolingual runs of the title and combination of title and description topics, their translation queries using transitive translation (Indonesian queries are translated to English queries via German and French), and applying the query expansion Query Monolingual Transitive Translation Title + Description (-30.20%) (uiqtintersectionunionsyn) Title + Description + QE (depok. uiqtdintersectionunionsynf b5d10t) Title + Description + QE (depok.uiqtdintersectionunionsynf b10d10t) (-15.26%) (-18.71%) Title + Description (Union) (-33.18%) Title + Description (Intersection & add untranslated Ind terms) (-34.56%) equivalent monolingual queries. Applying the query expansion technique (see Table 5) to the resulting English queries resulted in retrieval performance that is 15-33% below the equivalent monolingual queries. The best result of using query expansion for the translated queries was obtained by taking the intersection approach, which resulted in retrieval performance 15.26% lower than that of the equivalent monolingual queries. When the query expansion technique was applied to the translated queries resulted from using German as the pivot language the average retrieval performance dropped by 14-17% compared to the equivalent monolingual queries (see Table 4).
6 132 M. Adriani, H. Hayurani, and S. Sari Table 6. Mean Average Precision (MAP) of the monolingual runs of the title and combination of title and description topics and their translation queries using parallel corpus and query expansion Query Monolingual Parallel Corpus Title + Description (-90.77%) Title + Description + QE (5 terms from 5 terms) (-88.60%) Next, we obtained the English translation of the Indonesian queries using the parallel corpus-based technique. The pseudo translation that we applied to the Indonesian queries was done by taking the English documents that are parallel with the Indonesian documents marked as relevant to the Indonesian queries by the information retrieval system. We then took the top T English terms as the English queries that had the highest weights within the top N documents. The result (see Table 6) shows that the mean average precision dropped by 90.77% of the equivalent monolingual queries. The query expansion technique that was applied to the English queries only increased the mean average precision by 2.17%. The result of the parallel corpus-based technique was very poor because the Indonesian version of the English documents in the corpus was of poor quality, in terms of the accuracy of the translation. The retrieval performance of the transitive translation using one language, i.e. German, is better than using two languages, i.e., German and French. Translating Indonesian queries through German resulted in fewer definitions or senses than through French, meaning that the ambiguity of translating through Indonesian-German- English is less than that of translating through Indonesian-French-English. 5 Summary Our results demonstrate that the retrieval performance of queries that were translated using a machine translation technique for Bahasa Indonesia achieved the best retrieval performance compared to the transitive technique and the parallel corpus technique. However, two of the machine translation techniques for Indonesian and English produced different results. Even though the best result was achieved by translating Indonesian queries into English using one machine translation technique; another machine translation technique that was used for creating parallel corpus produced poor results. The result of using the transitive translation technique showed that by using only one pivot language, the retrieval performance of the translated queries was better than using two pivot languages. The query expansion that is applied to the translated queries improves the retrieval performance of the translated queries. Even though the transitive technique performance was not as good as the machine translation technique, it can be considered as a viable alternative method for the translation process, especially for languages that do not have many available language resources such as Bahasa Indonesia.
7 Indonesian-English Transitive Translation for Cross-Language Information Retrieval 133 References 1. Adriani, M., van Rijsbergen, C.J.: Term Similarity Based Query Expansion for Cross Language Information Retrieval. In: Abiteboul, S., Vercoustre, A.-M. (eds.) ECDL LNCS, vol. 1696, pp Springer, Heidelberg (1999) 2. Adriani, M.: Ambiguity Problem in Multilingual Information Retrieval. In: CLEF 2000 Working Note Workshop, Portugal (2000) 3. Ballesteros, L.A.: Cross Language Retrieval via transitive translation. In: Croft, W.B. (ed.) Advances in Information Retrieval: Recent Research from the CIIR, pp Kluwer Academic Publishers, Dordrecht (2000) 4. Chen, J., Nie, J.: Automatic Construction of Parallel English-Chinese Corpus for Cross- Language Information Retrieval. In: Proceedings of the 6th Conference on Applied Natural Language Processing, pp ACM Press, New York (2000) 5. Larenko, V., Choquette, M., Croft, W.B.: Cross-Lingual Relevance Models. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp ACM Press, New York (2002) 6. Nie, J., Simard, M., Isabelle, P., Durand, R.: Cross-Language Information Retrieval Based on Parallel Text and Automatic Mining of Parallel Text from the Web. In: Proceedings of the 22th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York (1999) 7. Gollins, T., Sanderson, M.: Improving Cross Language Retrieval with Triangulated Retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp ACM Press, New York (2004) 8. Lehtokangas, R., Airio, E., Jarvelin, K.: Transitive Dictionary Translation Challenges Direct Dictionary Translation in CLIR. Information Processing and Management: An International Journal 40(6), (2004) 9. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Cross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationMULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY
MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract
More informationCross-Lingual Text Categorization
Cross-Lingual Text Categorization Nuria Bel 1, Cornelis H.A. Koster 2, and Marta Villegas 1 1 Grup d Investigació en Lingüística Computacional Universitat de Barcelona, 028 - Barcelona, Spain. {nuria,tona}@gilc.ub.es
More informationComparing different approaches to treat Translation Ambiguity in CLIR: Structured Queries vs. Target Co occurrence Based Selection
1 Comparing different approaches to treat Translation Ambiguity in CLIR: Structured Queries vs. Target Co occurrence Based Selection X. Saralegi, M. Lopez de Lacalle Elhuyar R&D Zelai Haundi kalea, 3.
More informationCombining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval
Combining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval Jianqiang Wang and Douglas W. Oard College of Information Studies and UMIACS University of Maryland, College Park,
More informationCROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE
CROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE Pratibha Bajpai 1, Dr. Parul Verma 2 1 Research Scholar, Department of Information Technology, Amity University, Lucknow 2 Assistant
More informationMultilingual Information Access Douglas W. Oard College of Information Studies, University of Maryland, College Park
Multilingual Information Access Douglas W. Oard College of Information Studies, University of Maryland, College Park Keywords Information retrieval, Information seeking behavior, Multilingual, Cross-lingual,
More informationDictionary-based techniques for cross-language information retrieval q
Information Processing and Management 41 (2005) 523 547 www.elsevier.com/locate/infoproman Dictionary-based techniques for cross-language information retrieval q Gina-Anne Levow a, *, Douglas W. Oard b,
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationResolving Ambiguity for Cross-language Retrieval
Resolving Ambiguity for Cross-language Retrieval Lisa Ballesteros balleste@cs.umass.edu Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst, MA
More informationCross-Language Information Retrieval
Cross-Language Information Retrieval ii Synthesis One liner Lectures Chapter in Title Human Language Technologies Editor Graeme Hirst, University of Toronto Synthesis Lectures on Human Language Technologies
More informationConstructing Parallel Corpus from Movie Subtitles
Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationBridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models
Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Jung-Tae Lee and Sang-Bum Kim and Young-In Song and Hae-Chang Rim Dept. of Computer &
More informationCROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2
1 CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 Peter A. Chew, Brett W. Bader, Ahmed Abdelali Proceedings of the 13 th SIGKDD, 2007 Tiago Luís Outline 2 Cross-Language IR (CLIR) Latent Semantic Analysis
More informationMultilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities
Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities Soto Montalvo GAVAB Group URJC Raquel Martínez NLP&IR Group UNED Arantza Casillas Dpt. EE UPV-EHU Víctor Fresno GAVAB
More informationMatching Meaning for Cross-Language Information Retrieval
Matching Meaning for Cross-Language Information Retrieval Jianqiang Wang Department of Library and Information Studies University at Buffalo, the State University of New York Buffalo, NY 14260, U.S.A.
More informationarxiv:cs/ v2 [cs.cl] 7 Jul 1999
Cross-Language Information Retrieval for Technical Documents Atsushi Fujii and Tetsuya Ishikawa University of Library and Information Science 1-2 Kasuga Tsukuba 35-855, JAPAN {fujii,ishikawa}@ulis.ac.jp
More informationMatching Similarity for Keyword-Based Clustering
Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web
More informationExploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data
Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer
More informationUsing Synonyms for Author Recognition
Using Synonyms for Author Recognition Abstract. An approach for identifying authors using synonym sets is presented. Drawing on modern psycholinguistic research, we justify the basis of our theory. Having
More informationDetecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011
Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011 Cristian-Alexandru Drăgușanu, Marina Cufliuc, Adrian Iftene UAIC: Faculty of Computer Science, Alexandru Ioan Cuza University,
More informationCross-lingual Text Fragment Alignment using Divergence from Randomness
Cross-lingual Text Fragment Alignment using Divergence from Randomness Sirvan Yahyaei, Marco Bonzanini, and Thomas Roelleke Queen Mary, University of London Mile End Road, E1 4NS London, UK {sirvan,marcob,thor}@eecs.qmul.ac.uk
More informationEnglish-Chinese Cross-Lingual Retrieval Using a Translation Package
English-Chinese Cross-Lingual Retrieval Using a Translation Package K. L. Kwok 23 January, 1999 Paper ID Code: 139 Submission type: Thematic Topic Area: I1 Word Count: 3100 (excluding refereneces & tables)
More informationMultilingual Sentiment and Subjectivity Analysis
Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department
More informationUMass at TDT Similarity functions 1. BASIC SYSTEM Detection algorithms. set globally and apply to all clusters.
UMass at TDT James Allan, Victor Lavrenko, David Frey, and Vikas Khandelwal Center for Intelligent Information Retrieval Department of Computer Science University of Massachusetts Amherst, MA 3 We spent
More informationPostprint.
http://www.diva-portal.org Postprint This is the accepted version of a paper presented at CLEF 2013 Conference and Labs of the Evaluation Forum Information Access Evaluation meets Multilinguality, Multimodality,
More informationNotes and references on early automatic classification work
Notes and references on early automatic classification work Karen Sparck Jones Computer Laboratory, University of Cambridge February 1991 The final version of this paper appeared in ACM SIGIR Forum, 25(2),
More informationLanguage Independent Passage Retrieval for Question Answering
Language Independent Passage Retrieval for Question Answering José Manuel Gómez-Soriano 1, Manuel Montes-y-Gómez 2, Emilio Sanchis-Arnal 1, Luis Villaseñor-Pineda 2, Paolo Rosso 1 1 Polytechnic University
More informationThe role of the first language in foreign language learning. Paul Nation. The role of the first language in foreign language learning
1 Article Title The role of the first language in foreign language learning Author Paul Nation Bio: Paul Nation teaches in the School of Linguistics and Applied Language Studies at Victoria University
More informationTerm Weighting based on Document Revision History
Term Weighting based on Document Revision History Sérgio Nunes, Cristina Ribeiro, and Gabriel David INESC Porto, DEI, Faculdade de Engenharia, Universidade do Porto. Rua Dr. Roberto Frias, s/n. 4200-465
More informationIntegrating Semantic Knowledge into Text Similarity and Information Retrieval
Integrating Semantic Knowledge into Text Similarity and Information Retrieval Christof Müller, Iryna Gurevych Max Mühlhäuser Ubiquitous Knowledge Processing Lab Telecooperation Darmstadt University of
More informationFinding Translations in Scanned Book Collections
Finding Translations in Scanned Book Collections Ismet Zeki Yalniz Dept. of Computer Science University of Massachusetts Amherst, MA, 01003 zeki@cs.umass.edu R. Manmatha Dept. of Computer Science University
More informationA heuristic framework for pivot-based bilingual dictionary induction
2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,
More informationGeorgetown University at TREC 2017 Dynamic Domain Track
Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain
More informationLANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 11 : 12 December 2011 ISSN
LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume ISSN 1930-2940 Managing Editor: M. S. Thirumalai, Ph.D. Editors: B. Mallikarjun, Ph.D. Sam Mohanlal, Ph.D. B. A. Sharada, Ph.D.
More informationEUROPEAN DAY OF LANGUAGES
www.esl HOLIDAY LESSONS.com EUROPEAN DAY OF LANGUAGES http://www.eslholidaylessons.com/09/european_day_of_languages.html CONTENTS: The Reading / Tapescript 2 Phrase Match 3 Listening Gap Fill 4 Listening
More informationHow to read a Paper ISMLL. Dr. Josif Grabocka, Carlotta Schatten
How to read a Paper ISMLL Dr. Josif Grabocka, Carlotta Schatten Hildesheim, April 2017 1 / 30 Outline How to read a paper Finding additional material Hildesheim, April 2017 2 / 30 How to read a paper How
More informationIterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages
Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer
More informationSummarizing Text Documents: Carnegie Mellon University 4616 Henry Street
Summarizing Text Documents: Sentence Selection and Evaluation Metrics Jade Goldstein y Mark Kantrowitz Vibhu Mittal Jaime Carbonell y jade@cs.cmu.edu mkant@jprc.com mittal@jprc.com jgc@cs.cmu.edu y Language
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationTarget Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data
Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se
More informationOntological spine, localization and multilingual access
Start Ontological spine, localization and multilingual access Some reflections and a proposal New Perspectives on Subject Indexing and Classification in an International Context International Symposium
More informationGraphical Data Displays and Database Queries: Helping Users Select the Right Display for the Task
Graphical Data Displays and Database Queries: Helping Users Select the Right Display for the Task Beate Grawemeyer and Richard Cox Representation & Cognition Group, Department of Informatics, University
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationA DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA
International Journal of Semantic Computing Vol. 5, No. 4 (2011) 433 462 c World Scientific Publishing Company DOI: 10.1142/S1793351X1100133X A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF
More informationLaporan Penelitian Unggulan Prodi
Nama Rumpun Ilmu : Ilmu Sosial Laporan Penelitian Unggulan Prodi THE ROLE OF BAHASA INDONESIA IN FOREIGN LANGUAGE TEACHING AT THE LANGUAGE TRAINING CENTER UMY Oleh: Dedi Suryadi, M.Ed. Ph.D NIDN : 0504047102
More informationPerformance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database
Journal of Computer and Communications, 2016, 4, 79-89 Published Online August 2016 in SciRes. http://www.scirp.org/journal/jcc http://dx.doi.org/10.4236/jcc.2016.410009 Performance Analysis of Optimized
More informationThe development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach
BILINGUAL LEARNERS DICTIONARIES The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach Mark VAN MOL, Leuven, Belgium Abstract This paper reports on the
More informationMulti-Lingual Text Leveling
Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency
More informationCROSS LANGUAGE INFORMATION RETRIEVAL FOR LANGUAGES WITH SCARCE RESOURCES. Christian E. Loza. Thesis Prepared for the Degree of MASTER OF SCIENCE
CROSS LANGUAGE INFORMATION RETRIEVAL FOR LANGUAGES WITH SCARCE RESOURCES Christian E. Loza Thesis Prepared for the Degree of MASTER OF SCIENCE UNIVERSITY OF NORTH TEXAS May 2009 APPROVED: Rada Mihalcea,
More informationData Integration through Clustering and Finding Statistical Relations - Validation of Approach
Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Marek Jaszuk, Teresa Mroczek, and Barbara Fryc University of Information Technology and Management, ul. Sucharskiego
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More information2 Mitsuru Ishizuka x1 Keywords Automatic Indexing, PAI, Asserted Keyword, Spreading Activation, Priming Eect Introduction With the increasing number o
PAI: Automatic Indexing for Extracting Asserted Keywords from a Document 1 PAI: Automatic Indexing for Extracting Asserted Keywords from a Document Naohiro Matsumura PRESTO, Japan Science and Technology
More informationThe Internet as a Normative Corpus: Grammar Checking with a Search Engine
The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a
More informationCal s Dinner Card Deals
Cal s Dinner Card Deals Overview: In this lesson students compare three linear functions in the context of Dinner Card Deals. Students are required to interpret a graph for each Dinner Card Deal to help
More informationConversational Framework for Web Search and Recommendations
Conversational Framework for Web Search and Recommendations Saurav Sahay and Ashwin Ram ssahay@cc.gatech.edu, ashwin@cc.gatech.edu College of Computing Georgia Institute of Technology Atlanta, GA Abstract.
More informationAs a high-quality international conference in the field
The New Automated IEEE INFOCOM Review Assignment System Baochun Li and Y. Thomas Hou Abstract In academic conferences, the structure of the review process has always been considered a critical aspect of
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationTHE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS
THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS ELIZABETH ANNE SOMERS Spring 2011 A thesis submitted in partial
More informationThe Role of String Similarity Metrics in Ontology Alignment
The Role of String Similarity Metrics in Ontology Alignment Michelle Cheatham and Pascal Hitzler August 9, 2013 1 Introduction Tim Berners-Lee originally envisioned a much different world wide web than
More informationLongest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 6, Ver. IV (Nov Dec. 2015), PP 01-07 www.iosrjournals.org Longest Common Subsequence: A Method for
More informationDetecting English-French Cognates Using Orthographic Edit Distance
Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National
More informationI N T E R P R E T H O G A N D E V E L O P HOGAN BUSINESS REASONING INVENTORY. Report for: Martina Mustermann ID: HC Date: May 02, 2017
S E L E C T D E V E L O P L E A D H O G A N D E V E L O P I N T E R P R E T HOGAN BUSINESS REASONING INVENTORY Report for: Martina Mustermann ID: HC906276 Date: May 02, 2017 2 0 0 9 H O G A N A S S E S
More informationDOES RETELLING TECHNIQUE IMPROVE SPEAKING FLUENCY?
DOES RETELLING TECHNIQUE IMPROVE SPEAKING FLUENCY? Noor Rachmawaty (itaw75123@yahoo.com) Istanti Hermagustiana (dulcemaria_81@yahoo.com) Universitas Mulawarman, Indonesia Abstract: This paper is based
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationObserving Teachers: The Mathematics Pedagogy of Quebec Francophone and Anglophone Teachers
Observing Teachers: The Mathematics Pedagogy of Quebec Francophone and Anglophone Teachers Dominic Manuel, McGill University, Canada Annie Savard, McGill University, Canada David Reid, Acadia University,
More informationMeasurement. When Smaller Is Better. Activity:
Measurement Activity: TEKS: When Smaller Is Better (6.8) Measurement. The student solves application problems involving estimation and measurement of length, area, time, temperature, volume, weight, and
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationGENERAL COMMENTS Some students performed well on the 2013 Tamil written examination. However, there were some who did not perform well.
2013 Languages: Tamil GA 3: Written component GENERAL COMMENTS Some students performed well on the 2013 Tamil written examination. However, there were some who did not perform well. The marks allocated
More informationUniversiteit Leiden ICT in Business
Universiteit Leiden ICT in Business Ranking of Multi-Word Terms Name: Ricardo R.M. Blikman Student-no: s1184164 Internal report number: 2012-11 Date: 07/03/2013 1st supervisor: Prof. Dr. J.N. Kok 2nd supervisor:
More informationA Student s Assistant for Open e-learning
T4E 2009 Aparna Lalingar IIITB * Bangalore, India e-mail: aparna.l@iiitb.ac.in A Student s Assistant for Open e-learning Srinivasan Ramani IIITB * and HP Labs India Bangalore, India e-mail: ramanisl@vsnl.com
More informationEnglish-German Medical Dictionary And Phrasebook By A.H. Zemback
English-German Medical Dictionary And Phrasebook By A.H. Zemback If you are searching for a ebook English-German Medical Dictionary and Phrasebook by A.H. Zemback in pdf form, then you've come to loyal
More informationCommunity-oriented Course Authoring to Support Topic-based Student Modeling
Community-oriented Course Authoring to Support Topic-based Student Modeling Sergey Sosnovsky, Michael Yudelson, Peter Brusilovsky School of Information Sciences, University of Pittsburgh, USA {sas15, mvy3,
More informationOrganizational Knowledge Distribution: An Experimental Evaluation
Association for Information Systems AIS Electronic Library (AISeL) AMCIS 24 Proceedings Americas Conference on Information Systems (AMCIS) 12-31-24 : An Experimental Evaluation Surendra Sarnikar University
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationCOMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR
COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR ROLAND HAUSSER Institut für Deutsche Philologie Ludwig-Maximilians Universität München München, West Germany 1. CHOICE OF A PRIMITIVE OPERATION The
More informationCommon Core State Standards
Common Core State Standards Common Core State Standards 7.NS.3 Solve real-world and mathematical problems involving the four operations with rational numbers. Mathematical Practices 1, 3, and 4 are aspects
More informationAn evolutionary survey from Monolingual Text Reuse to Cross Lingual Text Reuse in context to English-Hindi. Aarti Kumar*, Sujoy Das** IJSER
996 An evolutionary survey from Monolingual Text Reuse to Cross Lingual Text Reuse in context to English-Hindi Aarti Kumar*, Sujoy Das** Abstract-With enormous amount of information in multiple efficient
More informationA Comparison of Two Text Representations for Sentiment Analysis
010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational
More informationLiterature and the Language Arts Experiencing Literature
Correlation of Literature and the Language Arts Experiencing Literature Grade 9 2 nd edition to the Nebraska Reading/Writing Standards EMC/Paradigm Publishing 875 Montreal Way St. Paul, Minnesota 55102
More informationSwitchboard Language Model Improvement with Conversational Data from Gigaword
Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword
More informationStandardized Assessment & Data Overview December 21, 2015
Standardized Assessment & Data Overview December 21, 2015 Peters Township School District, as a public school entity, will enable students to realize their potential to learn, live, lead and succeed. 2
More informationThe University of Amsterdam s Concept Detection System at ImageCLEF 2011
The University of Amsterdam s Concept Detection System at ImageCLEF 2011 Koen E. A. van de Sande and Cees G. M. Snoek Intelligent Systems Lab Amsterdam, University of Amsterdam Software available from:
More informationToward Reproducible Baselines: The Open-Source IR Reproducibility Challenge
Toward Reproducible Baselines: The Open-Source IR Reproducibility Challenge Jimmy Lin 1(B), Matt Crane 1, Andrew Trotman 2, Jamie Callan 3, Ishan Chattopadhyaya 4, John Foley 5, Grant Ingersoll 4, Craig
More informationClickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models
Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models Jianfeng Gao Microsoft Research One Microsoft Way Redmond, WA 98052 USA jfgao@microsoft.com Xiaodong He Microsoft
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More information(Care-o-theque) Pflegiothek is a care manual and the ideal companion for those working or training in the areas of nursing-, invalid- and geriatric
vocational education CARING PROFESSIONS In guten Händen (In Good Hands) Nurse training is undergoing worldwide change. Social and health policy changes (demographic changes, healthcare prevention and prophylaxis
More informationGuru: A Computer Tutor that Models Expert Human Tutors
Guru: A Computer Tutor that Models Expert Human Tutors Andrew Olney 1, Sidney D'Mello 2, Natalie Person 3, Whitney Cade 1, Patrick Hays 1, Claire Williams 1, Blair Lehman 1, and Art Graesser 1 1 University
More informationA Bayesian Learning Approach to Concept-Based Document Classification
Databases and Information Systems Group (AG5) Max-Planck-Institute for Computer Science Saarbrücken, Germany A Bayesian Learning Approach to Concept-Based Document Classification by Georgiana Ifrim Supervisors
More informationMETHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS
METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar
More informationScienceDirect. Malayalam question answering system
Available online at www.sciencedirect.com ScienceDirect Procedia Technology 24 (2016 ) 1388 1392 International Conference on Emerging Trends in Engineering, Science and Technology (ICETEST - 2015) Malayalam
More informationComputer Science PhD Program Evaluation Proposal Based on Domain and Non-Domain Characteristics
Computer Science PhD Program Evaluation Proposal Based on Domain and Non-Domain Characteristics Jan Werewka, Michał Turek Department of Applied Computer Science AGH University of Science and Technology
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationAge Effects on Syntactic Control in. Second Language Learning
Age Effects on Syntactic Control in Second Language Learning Miriam Tullgren Loyola University Chicago Abstract 1 This paper explores the effects of age on second language acquisition in adolescents, ages
More informationDEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS
DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS Julia Tmshkina Centre for Text Techitology, North-West University, 253 Potchefstroom, South Africa 2025770@puk.ac.za
More informationA Case-Based Approach To Imitation Learning in Robotic Agents
A Case-Based Approach To Imitation Learning in Robotic Agents Tesca Fitzgerald, Ashok Goel School of Interactive Computing Georgia Institute of Technology, Atlanta, GA 30332, USA {tesca.fitzgerald,goel}@cc.gatech.edu
More informationAustralian Journal of Basic and Applied Sciences
AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean
More information