Cross-lingual named entity extraction and disambiguation
|
|
- Audrey Parks
- 6 years ago
- Views:
Transcription
1 Cross-lingual named entity extraction and disambiguation Tadej Štajner 1,2, Dunja Mladenić 1,2 1 Artificial Intelligence Laboratory, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International Postgraduate School, Ljubljana, Slovenia tadej.stajner@ijs.si Abstract. We propose a method for the task of identifying and disambiguation of named entities in a scenario where the language of the input text differs from the language of the knowledge base. We demonstrate this functionality on English and Slovene named entity disambiguation Keywords: Natural language processing, knowledge management, multilingual information management, cross-lingual information retrieval 1 Introduction Since a lot of our world s knowledge is present in textual format in multiple languages rather than a more explicit or language-neutral format, an interesting challenge is automatically integrating texts with structured and semi-structured resources, such as knowledge bases, collections of entities having various properties, such as labels and textual descriptions. Recent work focuses on the fact that all of this knowledge can be spread over many languages [6]. While Wikipedia, the free encyclopaedia, is a famous example, the same problem is applicable on many domains where text is present in multiple languages. In the domain of crosslingual text annotation, we focus on the tasks of entity extraction and disambiguation (NED). We demonstrate a multilingual named entity extraction and disambiguation pipeline, operating for English and Slovene in order to demonstrate the capability of re-using language resources across languages within the Enrycher system [8]. 1 Motivation Many machine translation systems are not aware of named entities and special handling that is often required for them, and instead simply attempt to literally translate them. This often results in errors, for instance in Google Translate
2 changing the name of the music band Foo Fighters into Sigur Ros, an Icelandic music band, when translating from English to Icelandic. This illustrates the need for special handling of proper names when doing machine translation. By performing named entity extraction and disambiguation before translation, we are able to use a knowledge base to find a correct translation for that named entity. The second problem comes up in performing NED in a language that has poor domain coverage in the knowledge base. Consequently, entities that are extracted are not correctly disambiguated, since they don t exist in that particular language. However, the entity that we are looking for can exist in the knowledge base in a different language. However, directly using that language introduces new problems, since many of the components assume that the language of the input text corresponds to the language of the knowledge base labels and descriptions. 2 Related work The simplest solution for cross-lingual entity disambiguation is the one that simply disregards the language mismatch and tries to use the full textual content to perform the context similarity without any additional processing [1]. The authors have shown that using a merged bilingual knowledge base performed significantly better than using just the document language knowledge base, mainly due to better domain coverage, but it performed much worse than a monolingual scenario. Another simple baseline uses the equivalent of just using the context-independent mention popularity measure, backed by a dictionary [2]. The dictionary can be constructed from looking at anchor texts from non-english to English Wikipedia pages. An ideal system would be the one that would simply translate the document in the desired language and do the disambiguation on the translation. While doing so manually is not feasible for our task, one may use machine translation to do this [6]. While they achieve up to 94% performance of a monolingual baseline, machine translation greatly complicates and slows down the processing, opening an window for more efficient approaches. 3 Problem description We state the problem as identifying and disambiguating concepts that appear as mentions within a fragment of text. Disambiguation is important because phrases may have many distinct meanings. While human readers are able to infer the
3 meaning from context, this task is difficult for computers. For instance, the phrase Washington can be either a person, location or an organization, and even constraining its type to a location yields over sixty possible different location that are named that way. 3.1 Named entity extraction Named entity extraction is the task of using the surrounding context to isolate the part of text which represents an entity, referred to by a proper name. It is often coupled with entity classification, determining to what class it belongs to, for instance a person or an organization. In general, these are implemented as supervised sequence classifiers. 3.2 Named entity disambiguation Ambiguities, which are inherently present in natural languages represent a challenge of determining the actual identities of entities mentioned in a document (e.g., Paris can refer to a city in France but it can also refer to a small city in Texas, USA or to a 1984 film directed by Wim Wenders having title Paris, Texas). Well-defined entities and relationships are a property of the knowledge model which asserts that a single term has only a single meaning. In that case, we refer to terms as entities. We achieve this property by performing entity resolution. In general, state of the art entity disambiguation systems use three main heuristics: Mention popularity captures the overall most likely meanings of entity phrases. It is typically modelled by the conditional probability of the named entity given a mention. Context similarity: This heuristic captures the entity that best fits the topical context around the mention. It is modelled by the similarity of the mention s context and the entity s context, using a similarity measure operating on a bag-of-words model. The mention s context is a window of words around the mention in the input text, and the entity s context is its description. Coherence: This heuristic collectively captures the entities that make sense appearing together because they are somehow related to one another. While context similarity operates on a single mention-entity pair, the coherence heuristic is collective, operating on the whole input document. It is typically solved by a greedy graph pruning algorithm.
4 3.3 Cross-lingual named entity disambiguation When extending this pipeline into a scenario where the input and the knowledge base are represented in multiple languages, the biggest impact of this change is on the context similarity heuristic. Because it operates on the level of lexical similarity, its output has little meaning when the assumption of a single language is removed. 4 Proposed method We propose a method that incorporates a cross-lingual similarity measure into the framework. Instead of just computing literal context similarity between two contexts of different languages, we use an additional linear mapping that is able to map one vector of bag-of-words features into another such vector in another language. This enables us to perform meaningful similarity computation on the same vector space. The method used in this approach is Regression Canonical Correlation Analysis (rcca), a dimensionality reduction technique operation on two views that finds a linear combination of vectors from both views (languages) that are maximally correlated. The first vector corresponds to the input document, while the second one corresponds to the optimal mapping of it. However, instead of calculating this mapping in advance, we solve the optimization problem for each input document separately around the input document as the initial projection vector. Input text Direct similarity Crosslingual Mapped text Entity Knowledge base mapping Cross similarity Figure 1: The setup of obtaining similarity in cross-lingual NED Figure 1 represents the two ways of obtaining a context similarity measure between an input document and one of the candidate entities. When the languages of the input and the knowledge base are the same, we use direct similarity. When they differ, we first try to map the cross-lingual mapping (green triangle) into a vector
5 space, compatible with the knowledge base. However, using a cross-lingual mapping exposes us to the risk of poor domain coverage. Initial experiments show that because the cross-lingual mapping was not able to map some of the words from the input document, it will have poor performance. Therefore, we interpolate the cross-similarity with the direct similarity with the proportion of the words that the cross-lingual mapping was able to recognize. In pre-processing, we use the Stanford Named Entity Recognizer [9] for English named entity recognition. For Slovene, we have developed a Slovene named entity recognizer using a CRF (Conditional random fields) model trained on the SSJ-500k corpus [9]. 5 Discussion and conclusions Current preliminary experiments show that obtaining a cross-lingual mapping does improve on the context-similarity based NED when the training corpus and the input text share a common topic. However, it is not yet certain whether it compares favourably to a machine translation based system. Current work demonstrates that the interpolation between direct and cross-lingual similarity help the robustness of the systems. Future work will involve evaluating different crosslingual similarity models, as well as transliteration models and data integration issues that arise when dealing with multilingual knowledge bases. References: [1] A. Lommatzsch et al, Named Entity Disambiguation for German News Articles, WIR 2010 [2] Spitkovsky, V.I. and Chang, A.X., Strong baselines for cross-lingual entity linking, TAC 2011 [3] T. Štajner and D. Mladenić: Entity resolution in texts using statistical learning and ontologies, ASWC 2009 [4] J. Rupnik, B. Fortuna. Regression Canonical Correlation Analysis. Learning from Multiple Sources, NIPS Workshop, 2008 [5] Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H, Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G. (2011). Robust Disambiguation of Named Entities in Text. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, [6] McNamee, P., Mayfield, J., Oard, D. W., Lawrie, D., & Doermann, D. (2011). Cross- Language Entity Linking. IJCNLP 2001, [7] Učni korpus Sporazumevanje v Slovenskem Jeziku, April 2012 [8] Štajner, T., Rusu, D., Dali, L., Fortuna, B., Mladenić, D., Grobelnik, M. A service oriented framework for natural language text enrichment. Informatica (Ljublj.), 2010, vol. 34, no. 3, [9] Jenny Rose Finkel, Trond Grenager, and Christopher Manning Incorporating Nonlocal Information into Information Extraction Systems by Gibbs Sampling. Proceedings of the 43nd Annual Meeting of the ACL (ACL 2005), pp
6 For wider interest When attempting to understand text, one of the tasks that need to be solved is named entity disambiguation: for instance, Paris can refer to a city in France but it can also refer to a small city in Texas, USA or to a 1984 film directed by Wim Wenders having title Paris, Texas. Knowing the correct answer to that depends on the context. However, context is difficult to interpret if the input text is expressed in a different language than the knowledge base that these entities belong to. This is a very common scenario in processing Slovene text. While using the Slovene Wikipedia for this purpose is easy, it does not contain many entities that we may be interested in. While the English one is over thirty times bigger, it introduces a language barrier. We overcome this by applying techniques from cross-lingual information retrieval to the problem of identifying proper names in text and linking them to concrete knowledge base concepts. Another goal was to re-use language resources from languages with more resource in languages with less available resources. The work presented has resulted in a usable named entity extraction and disambiguation service that is able to work on Slovene text even while having a knowledge base in English. The demonstration is available at
OCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationMULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY
MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationAQUA: An Ontology-Driven Question Answering System
AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.
More informationDistant Supervised Relation Extraction with Wikipedia and Freebase
Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationA heuristic framework for pivot-based bilingual dictionary induction
2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationCross-Lingual Text Categorization
Cross-Lingual Text Categorization Nuria Bel 1, Cornelis H.A. Koster 2, and Marta Villegas 1 1 Grup d Investigació en Lingüística Computacional Universitat de Barcelona, 028 - Barcelona, Spain. {nuria,tona}@gilc.ub.es
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationPostprint.
http://www.diva-portal.org Postprint This is the accepted version of a paper presented at CLEF 2013 Conference and Labs of the Evaluation Forum Information Access Evaluation meets Multilinguality, Multimodality,
More informationMultilingual Sentiment and Subjectivity Analysis
Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationIntroduction to Simulation
Introduction to Simulation Spring 2010 Dr. Louis Luangkesorn University of Pittsburgh January 19, 2010 Dr. Louis Luangkesorn ( University of Pittsburgh ) Introduction to Simulation January 19, 2010 1 /
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationIndian Institute of Technology, Kanpur
Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar
More informationDKPro WSD A Generalized UIMA-based Framework for Word Sense Disambiguation
DKPro WSD A Generalized UIMA-based Framework for Word Sense Disambiguation Tristan Miller 1 Nicolai Erbs 1 Hans-Peter Zorn 1 Torsten Zesch 1,2 Iryna Gurevych 1,2 (1) Ubiquitous Knowledge Processing Lab
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationTarget Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data
Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationSoftware Maintenance
1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationActivities, Exercises, Assignments Copyright 2009 Cem Kaner 1
Patterns of activities, iti exercises and assignments Workshop on Teaching Software Testing January 31, 2009 Cem Kaner, J.D., Ph.D. kaner@kaner.com Professor of Software Engineering Florida Institute of
More informationSEMAFOR: Frame Argument Resolution with Log-Linear Models
SEMAFOR: Frame Argument Resolution with Log-Linear Models Desai Chen or, The Case of the Missing Arguments Nathan Schneider SemEval July 16, 2010 Dipanjan Das School of Computer Science Carnegie Mellon
More informationThe Karlsruhe Institute of Technology Translation Systems for the WMT 2011
The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 Teresa Herrmann, Mohammed Mediani, Jan Niehues and Alex Waibel Karlsruhe Institute of Technology Karlsruhe, Germany firstname.lastname@kit.edu
More informationApplications of memory-based natural language processing
Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal
More informationThe Good Judgment Project: A large scale test of different methods of combining expert predictions
The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania
More informationReinforcement Learning by Comparing Immediate Reward
Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate
More informationUnsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model
Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.
More informationAn Interactive Intelligent Language Tutor Over The Internet
An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationDiscriminative Learning of Beam-Search Heuristics for Planning
Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University
More informationCalibration of Confidence Measures in Speech Recognition
Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE
More informationDefragmenting Textual Data by Leveraging the Syntactic Structure of the English Language
Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Nathaniel Hayes Department of Computer Science Simpson College 701 N. C. St. Indianola, IA, 50125 nate.hayes@my.simpson.edu
More informationMulti-Lingual Text Leveling
Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency
More informationA cognitive perspective on pair programming
Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2006 Proceedings Americas Conference on Information Systems (AMCIS) December 2006 A cognitive perspective on pair programming Radhika
More informationWHEN THERE IS A mismatch between the acoustic
808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,
More informationFinding Translations in Scanned Book Collections
Finding Translations in Scanned Book Collections Ismet Zeki Yalniz Dept. of Computer Science University of Massachusetts Amherst, MA, 01003 zeki@cs.umass.edu R. Manmatha Dept. of Computer Science University
More informationKnowledge based expert systems D H A N A N J A Y K A L B A N D E
Knowledge based expert systems D H A N A N J A Y K A L B A N D E What is a knowledge based system? A Knowledge Based System or a KBS is a computer program that uses artificial intelligence to solve problems
More informationStatewide Framework Document for:
Statewide Framework Document for: 270301 Standards may be added to this document prior to submission, but may not be removed from the framework to meet state credit equivalency requirements. Performance
More informationLearning Methods for Fuzzy Systems
Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8
More informationThe Strong Minimalist Thesis and Bounded Optimality
The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this
More informationMultilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities
Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities Soto Montalvo GAVAB Group URJC Raquel Martínez NLP&IR Group UNED Arantza Casillas Dpt. EE UPV-EHU Víctor Fresno GAVAB
More informationLQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization
LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization Annemarie Friedrich, Marina Valeeva and Alexis Palmer COMPUTATIONAL LINGUISTICS & PHONETICS SAARLAND UNIVERSITY, GERMANY
More informationOn the Combined Behavior of Autonomous Resource Management Agents
On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science
More informationClickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models
Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models Jianfeng Gao Microsoft Research One Microsoft Way Redmond, WA 98052 USA jfgao@microsoft.com Xiaodong He Microsoft
More informationNotes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1
Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationNCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches
NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches Yu-Chun Wang Chun-Kai Wu Richard Tzong-Han Tsai Department of Computer Science
More informationLanguage Independent Passage Retrieval for Question Answering
Language Independent Passage Retrieval for Question Answering José Manuel Gómez-Soriano 1, Manuel Montes-y-Gómez 2, Emilio Sanchis-Arnal 1, Luis Villaseñor-Pineda 2, Paolo Rosso 1 1 Polytechnic University
More informationSwitchboard Language Model Improvement with Conversational Data from Gigaword
Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword
More informationTask Tolerance of MT Output in Integrated Text Processes
Task Tolerance of MT Output in Integrated Text Processes John S. White, Jennifer B. Doyon, and Susan W. Talbott Litton PRC 1500 PRC Drive McLean, VA 22102, USA {white_john, doyon jennifer, talbott_susan}@prc.com
More informationProbability and Statistics Curriculum Pacing Guide
Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods
More informationLip reading: Japanese vowel recognition by tracking temporal changes of lip shape
Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,
More informationelearning OVERVIEW GFA Consulting Group GmbH 1
elearning OVERVIEW 23.05.2017 GFA Consulting Group GmbH 1 Definition E-Learning E-Learning means teaching and learning utilized by electronic technology and tools. 23.05.2017 Definition E-Learning GFA
More informationUSER ADAPTATION IN E-LEARNING ENVIRONMENTS
USER ADAPTATION IN E-LEARNING ENVIRONMENTS Paraskevi Tzouveli Image, Video and Multimedia Systems Laboratory School of Electrical and Computer Engineering National Technical University of Athens tpar@image.
More informationGALICIAN TEACHERS PERCEPTIONS ON THE USABILITY AND USEFULNESS OF THE ODS PORTAL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia GALICIAN TEACHERS PERCEPTIONS ON THE USABILITY AND USEFULNESS OF THE ODS PORTAL SONIA VALLADARES-RODRIGUEZ
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationTraining a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski
Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski Problem Statement and Background Given a collection of 8th grade science questions, possible answer
More informationA Bayesian Learning Approach to Concept-Based Document Classification
Databases and Information Systems Group (AG5) Max-Planck-Institute for Computer Science Saarbrücken, Germany A Bayesian Learning Approach to Concept-Based Document Classification by Georgiana Ifrim Supervisors
More informationExtending Place Value with Whole Numbers to 1,000,000
Grade 4 Mathematics, Quarter 1, Unit 1.1 Extending Place Value with Whole Numbers to 1,000,000 Overview Number of Instructional Days: 10 (1 day = 45 minutes) Content to Be Learned Recognize that a digit
More informationLanguage Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus
Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,
More informationEnsemble Technique Utilization for Indonesian Dependency Parser
Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id
More informationMaximizing Learning Through Course Alignment and Experience with Different Types of Knowledge
Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February
More informationHow to Judge the Quality of an Objective Classroom Test
How to Judge the Quality of an Objective Classroom Test Technical Bulletin #6 Evaluation and Examination Service The University of Iowa (319) 335-0356 HOW TO JUDGE THE QUALITY OF AN OBJECTIVE CLASSROOM
More informationDisambiguation of Thai Personal Name from Online News Articles
Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online
More informationPIRLS. International Achievement in the Processes of Reading Comprehension Results from PIRLS 2001 in 35 Countries
Ina V.S. Mullis Michael O. Martin Eugenio J. Gonzalez PIRLS International Achievement in the Processes of Reading Comprehension Results from PIRLS 2001 in 35 Countries International Study Center International
More informationPredicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks
Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com
More informationExploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data
Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer
More informationExperiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling
Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad
More informationA Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention
A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention Damien Teney 1, Peter Anderson 2*, David Golub 4*, Po-Sen Huang 3, Lei Zhang 3, Xiaodong He 3, Anton van den Hengel 1 1
More informationDetecting English-French Cognates Using Orthographic Edit Distance
Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National
More informationTextGraphs: Graph-based algorithms for Natural Language Processing
HLT-NAACL 06 TextGraphs: Graph-based algorithms for Natural Language Processing Proceedings of the Workshop Production and Manufacturing by Omnipress Inc. 2600 Anderson Street Madison, WI 53704 c 2006
More informationINPE São José dos Campos
INPE-5479 PRE/1778 MONLINEAR ASPECTS OF DATA INTEGRATION FOR LAND COVER CLASSIFICATION IN A NEDRAL NETWORK ENVIRONNENT Maria Suelena S. Barros Valter Rodrigues INPE São José dos Campos 1993 SECRETARIA
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationA Study of Metacognitive Awareness of Non-English Majors in L2 Listening
ISSN 1798-4769 Journal of Language Teaching and Research, Vol. 4, No. 3, pp. 504-510, May 2013 Manufactured in Finland. doi:10.4304/jltr.4.3.504-510 A Study of Metacognitive Awareness of Non-English Majors
More informationLearning Cases to Resolve Conflicts and Improve Group Behavior
From: AAAI Technical Report WS-96-02. Compilation copyright 1996, AAAI (www.aaai.org). All rights reserved. Learning Cases to Resolve Conflicts and Improve Group Behavior Thomas Haynes and Sandip Sen Department
More informationA study of speaker adaptation for DNN-based speech synthesis
A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,
More informationSpeech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationUniversity of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma
University of Alberta Large-Scale Semi-Supervised Learning for Natural Language Processing by Shane Bergsma A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationAnalysis of Enzyme Kinetic Data
Analysis of Enzyme Kinetic Data To Marilú Analysis of Enzyme Kinetic Data ATHEL CORNISH-BOWDEN Directeur de Recherche Émérite, Centre National de la Recherche Scientifique, Marseilles OXFORD UNIVERSITY
More informationCSL465/603 - Machine Learning
CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am
More informationVisit us at:
White Paper Integrating Six Sigma and Software Testing Process for Removal of Wastage & Optimizing Resource Utilization 24 October 2013 With resources working for extended hours and in a pressurized environment,
More informationThe stages of event extraction
The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks
More informationAssessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2
Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2 Ted Pedersen Department of Computer Science University of Minnesota Duluth, MN, 55812 USA tpederse@d.umn.edu
More informationGuru: A Computer Tutor that Models Expert Human Tutors
Guru: A Computer Tutor that Models Expert Human Tutors Andrew Olney 1, Sidney D'Mello 2, Natalie Person 3, Whitney Cade 1, Patrick Hays 1, Claire Williams 1, Blair Lehman 1, and Art Graesser 1 1 University
More informationParsing of part-of-speech tagged Assamese Texts
IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal
More informationLearning and Retaining New Vocabularies: The Case of Monolingual and Bilingual Dictionaries
Learning and Retaining New Vocabularies: The Case of Monolingual and Bilingual Dictionaries Mohsen Mobaraki Assistant Professor, University of Birjand, Iran mmobaraki@birjand.ac.ir *Amin Saed Lecturer,
More informationQuickStroke: An Incremental On-line Chinese Handwriting Recognition System
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents
More information12- A whirlwind tour of statistics
CyLab HT 05-436 / 05-836 / 08-534 / 08-734 / 19-534 / 19-734 Usable Privacy and Security TP :// C DU February 22, 2016 y & Secu rivac rity P le ratory bo La Lujo Bauer, Nicolas Christin, and Abby Marsh
More informationGrade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand
Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand Texas Essential Knowledge and Skills (TEKS): (2.1) Number, operation, and quantitative reasoning. The student
More information