CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2
|
|
- Mavis Wiggins
- 6 years ago
- Views:
Transcription
1 1 CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 Peter A. Chew, Brett W. Bader, Ahmed Abdelali Proceedings of the 13 th SIGKDD, 2007 Tiago Luís
2 Outline 2 Cross-Language IR (CLIR) Latent Semantic Analysis (LSA) CLIR using LSA Results of CLIR using LSA CLIR using PARAFAC2 Results of CLIR using PARAFAC2 Conclusions
3 Cross-Language IR (CLIR) 3 retrieve documents in one language to a query in another language example: a user may create a query in English but retrieve relevant documents written in French two approaches: translation of documents or queries [Hull et al. and Demner-Fushman et al.] mapping of queries and documents into a multilingual space
4 Cross-Language IR (CLIR) 4 multilingual spaces approaches latent model compute latent concepts from data and map documents to these concepts example: LSA (Latent Semantic Analysis) [Dumais et al. 1988] external category model map documents to a set of external categories, topics, or concepts vectors remain constant across different document collections example: ESA (Explicit Semantic Analysis) [Gabrilovich et al. 2007]
5 Latent Semantic Analysis 5 analyze relationships between documents and terms produce a set of latent concepts related to the documents and terms merge the dimensions (terms) that have similar meanings latent concepts correspond to certain topics that emerge from the document collection
6 Latent Semantic Analysis (cont.) 6 SVD (Singular Value Decomposition) performs the factorization of the term-by-document matrix X = USV t U - term-by-concept matrix S - diagonal matrix with the singular values ( strength of each concept) V- document-by-concept matrix image retrieved from Wikipedia
7 Latent Semantic Analysis (cont.) 7 example (two concepts: computer science and medicine) data inf. retrieval brain lung CS = x x MD image retrieved from Jure Leskovec s recitation
8 Latent Semantic Analysis (cont.) 8 querying map the query into the semantic space q concept = qus 1 calculate the cosine similarity measure between the query and the documents
9 CLIR using LSA 9 trained with a multilingual parallel aligned corpus example: Europarl [Philipp Koehn 2005], JRC-Acquis [Steinberger et al. 2006], etc English - that is almost a personal record for me this autumn! Portuguese - é quase o meu recorde pessoal deste semestre! each document consists of the concatenation of all the languages [Paul G. Young 1994] terms from all languages appear in any given document
10 CLIR using LSA (cont.) 10 example (two concepts: computer science and medicine) data inf. retrieval brain lung informacion datos CS MD image retrieved from Jure Leskovec s recitation
11 Results of CLIR using LSA 11 used the Bible as a parallel corpus the world s most widely translated book 2,426 partial translations 429 full translations how representative the vocabulary of the Bible is of modern vocabulary coverage of vocabulary is around 70% (according to their experiments)
12 Results of CLIR using LSA (cont.) 12 parallel corpus (the Bible) with 77 translations term-by-document matrix was 1,454,289 x 31,226 extremely sparse number of concepts (dimensions) equal to 280 test data 114 chapters of the Quran select languages: Arabic, English, French, Russian, and Spanish (total of 570 documents)
13 Results of CLIR using LSA (cont.) 13 each document divided into words and their frequencies were weighted vector was multiplied by US 1 (projected into a 300 -dimensional LSA space) evaluation measures precision at 1 document (for a given source and target language) multilingual precision at 5 documents (for 5 languages)
14 Results of CLIR using LSA (cont.) 14 precision at 1 document proportion of cases where the translation was retrieved first multilingual precision at 5 documents proportion of the top 5 retrieved results which are translations of the query
15 Results of CLIR using LSA (cont.) 15 precision at 1 document with LSA (280 dimensions) average: 0.780
16 Results of CLIR using LSA (cont.) 16 good ability to identify translations translation is retrieved first almost 80% of the time low multilingual precision documents cluster by language and not by topics
17 Results of CLIR using LSA (cont.) 17 - table illustrating low multilingual precision - statistical differences between languages
18 Results of CLIR using LSA (cont.) 18 advantages of LSA in CLIR relies only on the ability to tokenize text at he boundaries between words limitations of LSA in CLIR cannot distinguish homographs words from the different languages example: English coin versus French coin (corner) clustering documents language independently goal: group documents about similar topics problem: documents are clustered by language, not by topic
19 CLIR using PARAFAC2 19 tries to overcome LSA problems unable to make associations between words in different languages PARAFAC2 is a variant of PARAFAC PARAFAC is a multi-way generalization of the SVD [Richard A. Harshman 1970] imposes a constraint (not present in LSA) concepts in all documents in the parallel corpus are the same regardless of language
20 CLIR using PARAFAC2 (cont.) 20 form an irregular three-way array each slice is a separate term-by-document matrix for a single language in the parallel corpus
21 CLIR using PARAFAC2 (cont.) 21 X k = U k HS k V t Xk - M x N matrix (k denote the kth slice) Uk - a Mk x R matrix (R is the number of dimensions of the LSA space) H - a R x R matrix Sk - a x R diagonal matrix of weights for the kth slice of X V - an N x R factor matrix for the documents separate mapping for each language
22 CLIR using PARAFAC2 (cont.) 22 querying map the query into the semantic space q concept = qu k S k 1 1 vector is multiplied by the U k S k specific to the language of the query, rather than a general for all languages US 1 calculate the cosine similarity measure between the query and the documents
23 Results of CLIR using PARAFAC2 23 multilingual precision metrics PARAFAC2 outperforms LSA by a significant margin average: average: 0.760
24 Results of CLIR using PARAFAC2 (cont.) 24 clustering precision metrics PARAFAC2 also outperforms LSA
25 Results of CLIR using PARAFAC2 (cont.) 25 disadvantages PARAFAC2 needs more computation to obtain matrix decomposition advantages language specific general ones U k matrix multiplication is faster can deal with homographs matrices are smaller than the
26 Conclusions 26 PARAFAC2 is a highly compelling technique promising way for truly language-independent clustering of documents by topic PARAFAC2 is also a good technique for other problems in CLIR (not only for multilingual document clustering)
Probabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationUnsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model
Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationUsing Web Searches on Important Words to Create Background Sets for LSI Classification
Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationLatent Semantic Analysis
Latent Semantic Analysis Adapted from: www.ics.uci.edu/~lopes/teaching/inf141w10/.../lsa_intro_ai_seminar.ppt (from Melanie Martin) and http://videolectures.net/slsfs05_hofmann_lsvm/ (from Thomas Hoffman)
More informationInitial approaches on Cross-Lingual Information Retrieval using Statistical Machine Translation on User Queries
Initial approaches on Cross-Lingual Information Retrieval using Statistical Machine Translation on User Queries Marta R. Costa-jussà, Christian Paz-Trillo and Renata Wassermann 1 Computer Science Department
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationLanguage Independent Passage Retrieval for Question Answering
Language Independent Passage Retrieval for Question Answering José Manuel Gómez-Soriano 1, Manuel Montes-y-Gómez 2, Emilio Sanchis-Arnal 1, Luis Villaseñor-Pineda 2, Paolo Rosso 1 1 Polytechnic University
More informationComment-based Multi-View Clustering of Web 2.0 Items
Comment-based Multi-View Clustering of Web 2.0 Items Xiangnan He 1 Min-Yen Kan 1 Peichu Xie 2 Xiao Chen 3 1 School of Computing, National University of Singapore 2 Department of Mathematics, National University
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationFinding Translations in Scanned Book Collections
Finding Translations in Scanned Book Collections Ismet Zeki Yalniz Dept. of Computer Science University of Massachusetts Amherst, MA, 01003 zeki@cs.umass.edu R. Manmatha Dept. of Computer Science University
More informationCombining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval
Combining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval Jianqiang Wang and Douglas W. Oard College of Information Studies and UMIACS University of Maryland, College Park,
More informationA DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA
International Journal of Semantic Computing Vol. 5, No. 4 (2011) 433 462 c World Scientific Publishing Company DOI: 10.1142/S1793351X1100133X A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationA Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval
A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval Yelong Shen Microsoft Research Redmond, WA, USA yeshen@microsoft.com Xiaodong He Jianfeng Gao Li Deng Microsoft Research
More informationChapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard
Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA Alta de Waal, Jacobus Venter and Etienne Barnard Abstract Most actionable evidence is identified during the analysis phase of digital forensic investigations.
More informationBridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models
Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Jung-Tae Lee and Sang-Bum Kim and Young-In Song and Hae-Chang Rim Dept. of Computer &
More informationMeasurement. When Smaller Is Better. Activity:
Measurement Activity: TEKS: When Smaller Is Better (6.8) Measurement. The student solves application problems involving estimation and measurement of length, area, time, temperature, volume, weight, and
More informationCross-lingual Text Fragment Alignment using Divergence from Randomness
Cross-lingual Text Fragment Alignment using Divergence from Randomness Sirvan Yahyaei, Marco Bonzanini, and Thomas Roelleke Queen Mary, University of London Mile End Road, E1 4NS London, UK {sirvan,marcob,thor}@eecs.qmul.ac.uk
More informationThe Internet as a Normative Corpus: Grammar Checking with a Search Engine
The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a
More informationA Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many
Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.
More informationMULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY
MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract
More informationWord Translation Disambiguation without Parallel Texts
Word Translation Disambiguation without Parallel Texts Erwin Marsi André Lynum Lars Bungum Björn Gambäck Department of Computer and Information Science NTNU, Norwegian University of Science and Technology
More informationDetecting English-French Cognates Using Orthographic Edit Distance
Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationOntological spine, localization and multilingual access
Start Ontological spine, localization and multilingual access Some reflections and a proposal New Perspectives on Subject Indexing and Classification in an International Context International Symposium
More informationA Comparison of Two Text Representations for Sentiment Analysis
010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational
More informationEvaluating vector space models with canonical correlation analysis
Natural Language Engineering: page 1 of 38. c Cambridge University Press 211 doi:1.117/s1351324911271 1 Evaluating vector space models with canonical correlation analysis SAMI VIRPIOJA 1, MARI-SANNA PAUKKERI
More informationA Bayesian Learning Approach to Concept-Based Document Classification
Databases and Information Systems Group (AG5) Max-Planck-Institute for Computer Science Saarbrücken, Germany A Bayesian Learning Approach to Concept-Based Document Classification by Georgiana Ifrim Supervisors
More informationA Semantic Similarity Measure Based on Lexico-Syntactic Patterns
A Semantic Similarity Measure Based on Lexico-Syntactic Patterns Alexander Panchenko, Olga Morozova and Hubert Naets Center for Natural Language Processing (CENTAL) Université catholique de Louvain Belgium
More informationROSETTA STONE PRODUCT OVERVIEW
ROSETTA STONE PRODUCT OVERVIEW Method Rosetta Stone teaches languages using a fully-interactive immersion process that requires the student to indicate comprehension of the new language and provides immediate
More informationKnowledge-Free Induction of Inflectional Morphologies
Knowledge-Free Induction of Inflectional Morphologies Patrick SCHONE Daniel JURAFSKY University of Colorado at Boulder University of Colorado at Boulder Boulder, Colorado 80309 Boulder, Colorado 80309
More informationMatching Meaning for Cross-Language Information Retrieval
Matching Meaning for Cross-Language Information Retrieval Jianqiang Wang Department of Library and Information Studies University at Buffalo, the State University of New York Buffalo, NY 14260, U.S.A.
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationMatching Similarity for Keyword-Based Clustering
Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web
More informationStatewide Framework Document for:
Statewide Framework Document for: 270301 Standards may be added to this document prior to submission, but may not be removed from the framework to meet state credit equivalency requirements. Performance
More informationCOPING WITH LANGUAGE DATA SPARSITY: SEMANTIC HEAD MAPPING OF COMPOUND WORDS
COPING WITH LANGUAGE DATA SPARSITY: SEMANTIC HEAD MAPPING OF COMPOUND WORDS Joris Pelemans 1, Kris Demuynck 2, Hugo Van hamme 1, Patrick Wambacq 1 1 Dept. ESAT, Katholieke Universiteit Leuven, Belgium
More informationAGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS
AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationSTUDENTS' RATINGS ON TEACHER
STUDENTS' RATINGS ON TEACHER Faculty Member: CHEW TECK MENG IVAN Module: Activity Type: DATA STRUCTURES AND ALGORITHMS I CS1020 LABORATORY Class Size/Response Size/Response Rate : 21 / 14 / 66.67% Contact
More informationEvaluation of Usage Patterns for Web-based Educational Systems using Web Mining
Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl
More informationEvaluation of Usage Patterns for Web-based Educational Systems using Web Mining
Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More information*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN
From: AAAI Technical Report WS-98-08. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. Recommender Systems: A GroupLens Perspective Joseph A. Konstan *t, John Riedl *t, AI Borchers,
More informationAttributed Social Network Embedding
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, MAY 2017 1 Attributed Social Network Embedding arxiv:1705.04969v1 [cs.si] 14 May 2017 Lizi Liao, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua Abstract Embedding
More informationENGLISH LANGUAGE LEARNERS (ELL) UPDATE FOR SUNSHINE STATE TESOL 2013
ENGLISH LANGUAGE LEARNERS (ELL) UPDATE FOR SUNSHINE STATE TESOL 2013 Presented by: Chane Eplin, Bureau Chief Student Achievement through Language Acquisition Florida Department of Education May 16, 2013
More informationAs a high-quality international conference in the field
The New Automated IEEE INFOCOM Review Assignment System Baochun Li and Y. Thomas Hou Abstract In academic conferences, the structure of the review process has always been considered a critical aspect of
More informationSyntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure Jeff Mitchell, Mirella Lapata, Vera Demberg and Frank Keller University of Edinburgh Edinburgh, United Kingdom jeff.mitchell@ed.ac.uk,
More informationThe Role of String Similarity Metrics in Ontology Alignment
The Role of String Similarity Metrics in Ontology Alignment Michelle Cheatham and Pascal Hitzler August 9, 2013 1 Introduction Tim Berners-Lee originally envisioned a much different world wide web than
More informationIntroducing the New Iowa Assessments Mathematics Levels 12 14
Introducing the New Iowa Assessments Mathematics Levels 12 14 ITP Assessment Tools Math Interim Assessments: Grades 3 8 Administered online Constructed Response Supplements Reading, Language Arts, Mathematics
More informationLecture 2: Quantifiers and Approximation
Lecture 2: Quantifiers and Approximation Case study: Most vs More than half Jakub Szymanik Outline Number Sense Approximate Number Sense Approximating most Superlative Meaning of most What About Counting?
More informationPrimary National Curriculum Alignment for Wales
Mathletics and the Welsh Curriculum This alignment document lists all Mathletics curriculum activities associated with each Wales course, and demonstrates how these fit within the National Curriculum Programme
More informationChanges to GCSE and KS3 Grading Information Booklet for Parents
Changes to GCSE and KS3 Grading Information Booklet for Parents Changes to assessment in Years 10 & 11 As you are probably aware the government has made radical changes to the structure and assessment
More informationGrade 6: Correlated to AGS Basic Math Skills
Grade 6: Correlated to AGS Basic Math Skills Grade 6: Standard 1 Number Sense Students compare and order positive and negative integers, decimals, fractions, and mixed numbers. They find multiples and
More informationTerm Weighting based on Document Revision History
Term Weighting based on Document Revision History Sérgio Nunes, Cristina Ribeiro, and Gabriel David INESC Porto, DEI, Faculdade de Engenharia, Universidade do Porto. Rua Dr. Roberto Frias, s/n. 4200-465
More informationUMass at TDT Similarity functions 1. BASIC SYSTEM Detection algorithms. set globally and apply to all clusters.
UMass at TDT James Allan, Victor Lavrenko, David Frey, and Vikas Khandelwal Center for Intelligent Information Retrieval Department of Computer Science University of Massachusetts Amherst, MA 3 We spent
More informationPerformance Analysis of Optimized Content Extraction for Cyrillic Mongolian Learning Text Materials in the Database
Journal of Computer and Communications, 2016, 4, 79-89 Published Online August 2016 in SciRes. http://www.scirp.org/journal/jcc http://dx.doi.org/10.4236/jcc.2016.410009 Performance Analysis of Optimized
More informationAlignment of Australian Curriculum Year Levels to the Scope and Sequence of Math-U-See Program
Alignment of s to the Scope and Sequence of Math-U-See Program This table provides guidance to educators when aligning levels/resources to the Australian Curriculum (AC). The Math-U-See levels do not address
More information10.2. Behavior models
User behavior research 10.2. Behavior models Overview Why do users seek information? How do they seek information? How do they search for information? How do they use libraries? These questions are addressed
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationProgress Monitoring for Behavior: Data Collection Methods & Procedures
Progress Monitoring for Behavior: Data Collection Methods & Procedures This event is being funded with State and/or Federal funds and is being provided for employees of school districts, employees of the
More informationNumeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C
Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C Using and applying mathematics objectives (Problem solving, Communicating and Reasoning) Select the maths to use in some classroom
More informationConcepts and Properties in Word Spaces
Concepts and Properties in Word Spaces Marco Baroni 1 and Alessandro Lenci 2 1 University of Trento, CIMeC 2 University of Pisa, Department of Linguistics Abstract Properties play a central role in most
More informationSoftware Maintenance
1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories
More informationConstructing Parallel Corpus from Movie Subtitles
Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing
More informationContact: For more information on Breakthrough visit or contact Carmel Crévola at Resources:
Carmel Crévola is an independent international literary consultant, author, and researcher who works extensively in Australia, Canada, the United Kingdom, and the United States. Carmel Crévola s presentation
More informationText-mining the Estonian National Electronic Health Record
Text-mining the Estonian National Electronic Health Record Raul Sirel rsirel@ut.ee 13.11.2015 Outline Electronic Health Records & Text Mining De-identifying the Texts Resolving the Abbreviations Terminology
More informationOn document relevance and lexical cohesion between query terms
Information Processing and Management 42 (2006) 1230 1247 www.elsevier.com/locate/infoproman On document relevance and lexical cohesion between query terms Olga Vechtomova a, *, Murat Karamuftuoglu b,
More informationNORTH CAROLINA VIRTUAL PUBLIC SCHOOL IN WCPSS UPDATE FOR FALL 2007, SPRING 2008, AND SUMMER 2008
E&R Report No. 08.29 February 2009 NORTH CAROLINA VIRTUAL PUBLIC SCHOOL IN WCPSS UPDATE FOR FALL 2007, SPRING 2008, AND SUMMER 2008 Authors: Dina Bulgakov-Cooke, Ph.D., and Nancy Baenen ABSTRACT North
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationThe Smart/Empire TIPSTER IR System
The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of
More informationarxiv: v2 [cs.ir] 22 Aug 2016
Exploring Deep Space: Learning Personalized Ranking in a Semantic Space arxiv:1608.00276v2 [cs.ir] 22 Aug 2016 ABSTRACT Jeroen B. P. Vuurens The Hague University of Applied Science Delft University of
More informationBusuu The Mobile App. Review by Musa Nushi & Homa Jenabzadeh, Introduction. 30 TESL Reporter 49 (2), pp
30 TESL Reporter 49 (2), pp. 30 38 Busuu The Mobile App Review by Musa Nushi & Homa Jenabzadeh, Shahid Beheshti University, Tehran, Iran Introduction Technological innovations are changing the second language
More informationOutline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt
Outline Using Web Data for Linguistic Purposes NCLT, Dublin City University Outline Outline 1 Corpora as linguistic tools 2 Limitations of web data Strategies to enhance web data 3 Corpora as linguistic
More informationLearning to Rank with Selection Bias in Personal Search
Learning to Rank with Selection Bias in Personal Search Xuanhui Wang, Michael Bendersky, Donald Metzler, Marc Najork Google Inc. Mountain View, CA 94043 {xuanhui, bemike, metzler, najork}@google.com ABSTRACT
More informationADVANCED PLACEMENT STUDENTS IN COLLEGE: AN INVESTIGATION OF COURSE GRADES AT 21 COLLEGES. Rick Morgan Len Ramist
February 1998 Report No. SR-98-13 ADVANCED PLACEMENT STUDENTS IN COLLEGE: AN INVESTIGATION OF COURSE GRADES AT 21 COLLEGES Rick Morgan Len Ramist Unpublished Statistical Report This is a limited distribution
More informationSTAT 220 Midterm Exam, Friday, Feb. 24
STAT 220 Midterm Exam, Friday, Feb. 24 Name Please show all of your work on the exam itself. If you need more space, use the back of the page. Remember that partial credit will be awarded when appropriate.
More informationClickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models
Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models Jianfeng Gao Microsoft Research One Microsoft Way Redmond, WA 98052 USA jfgao@microsoft.com Xiaodong He Microsoft
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationLanguage Center. Course Catalog
Language Center Course Catalog 2016-2017 Mastery of languages facilitates access to new and diverse opportunities, and IE University (IEU) considers knowledge of multiple languages a key element of its
More informationDifferential Evolutionary Algorithm Based on Multiple Vector Metrics for Semantic Similarity Assessment in Continuous Vector Space
Differential Evolutionary Algorithm Based on Multiple Vector Metrics for Semantic Similarity Assessment in Continuous Vector Space Yuanyuan Cai, Wei Lu, Xiaoping Che, Kailun Shi School of Software Engineering
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationTarget Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data
Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se
More informationThe role of the first language in foreign language learning. Paul Nation. The role of the first language in foreign language learning
1 Article Title The role of the first language in foreign language learning Author Paul Nation Bio: Paul Nation teaches in the School of Linguistics and Applied Language Studies at Victoria University
More informationPostprint.
http://www.diva-portal.org Postprint This is the accepted version of a paper presented at CLEF 2013 Conference and Labs of the Evaluation Forum Information Access Evaluation meets Multilinguality, Multimodality,
More informationUnsupervised Cross-Lingual Scaling of Political Texts
Unsupervised Cross-Lingual Scaling of Political Texts Goran Glavaš and Federico Nanni and Simone Paolo Ponzetto Data and Web Science Group University of Mannheim B6, 26, DE-68159 Mannheim, Germany {goran,
More informationA study of speaker adaptation for DNN-based speech synthesis
A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,
More informationDublin City Schools Mathematics Graded Course of Study GRADE 4
I. Content Standard: Number, Number Sense and Operations Standard Students demonstrate number sense, including an understanding of number systems and reasonable estimates using paper and pencil, technology-supported
More informationExperiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling
Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad
More informationExtending Place Value with Whole Numbers to 1,000,000
Grade 4 Mathematics, Quarter 1, Unit 1.1 Extending Place Value with Whole Numbers to 1,000,000 Overview Number of Instructional Days: 10 (1 day = 45 minutes) Content to Be Learned Recognize that a digit
More informationSummarizing Answers in Non-Factoid Community Question-Answering
Summarizing Answers in Non-Factoid Community Question-Answering Hongya Song Zhaochun Ren Shangsong Liang hongya.song.sdu@gmail.com zhaochun.ren@ucl.ac.uk shangsong.liang@ucl.ac.uk Piji Li Jun Ma Maarten
More informationEnglish-German Medical Dictionary And Phrasebook By A.H. Zemback
English-German Medical Dictionary And Phrasebook By A.H. Zemback If you are searching for a ebook English-German Medical Dictionary and Phrasebook by A.H. Zemback in pdf form, then you've come to loyal
More informationCROSS LANGUAGE INFORMATION RETRIEVAL FOR LANGUAGES WITH SCARCE RESOURCES. Christian E. Loza. Thesis Prepared for the Degree of MASTER OF SCIENCE
CROSS LANGUAGE INFORMATION RETRIEVAL FOR LANGUAGES WITH SCARCE RESOURCES Christian E. Loza Thesis Prepared for the Degree of MASTER OF SCIENCE UNIVERSITY OF NORTH TEXAS May 2009 APPROVED: Rada Mihalcea,
More informationMTH 215: Introduction to Linear Algebra
MTH 215: Introduction to Linear Algebra Fall 2017 University of Rhode Island, Department of Mathematics INSTRUCTOR: Jonathan A. Chávez Casillas E-MAIL: jchavezc@uri.edu LECTURE TIMES: Tuesday and Thursday,
More informationMatrices, Compression, Learning Curves: formulation, and the GROUPNTEACH algorithms
Matrices, Compression, Learning Curves: formulation, and the GROUPNTEACH algorithms Bryan Hooi 1, Hyun Ah Song 1, Evangelos Papalexakis 1, Rakesh Agrawal 2, and Christos Faloutsos 1 1 Carnegie Mellon University,
More informationSwitchboard Language Model Improvement with Conversational Data from Gigaword
Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword
More informationLearning a Cross-Lingual Semantic Representation of Relations Expressed in Text
Learning a Cross-Lingual Semantic Representation of Relations Expressed in Text Achim Rettinger, Artem Schumilin, Steffen Thoma, and Basil Ell Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
More information2 Mitsuru Ishizuka x1 Keywords Automatic Indexing, PAI, Asserted Keyword, Spreading Activation, Priming Eect Introduction With the increasing number o
PAI: Automatic Indexing for Extracting Asserted Keywords from a Document 1 PAI: Automatic Indexing for Extracting Asserted Keywords from a Document Naohiro Matsumura PRESTO, Japan Science and Technology
More information