Semantic Domains in Computational Linguistics
|
|
- Rolf Garrison
- 6 years ago
- Views:
Transcription
1 Semantic Domains in Computational Linguistics
2 Alfio Gliozzo Carlo Strapparava Semantic Domains in Computational Linguistics
3 Dr. Alfio Gliozzo FBK-irst Via Sommarive Povo-Trento Italy Dr. Carlo Strapparava FBK-irst Via Sommarive Povo-Trento Italy ISBN e-isbn DOI / Springer Dordrecht Heidelberg London New York Library of Congress Control Number: ACM Computing Classification (1998): 1.2.7, H.3.1, J.5 Springer-Verlag Berlin Heidelberg 2009 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover design: KuenkelLopka GmbH Printed on acid-free paper Springer is part of Springer Science+Business Media (
4 Preface Ambiguity and variability are two basic and pervasive phenomena characterizing lexical semantics. In this book we introduce a computational model for lexical semantics based on Semantic Domains. This concept is inspired by the Theory of Semantic Fields, proposed in structural linguistics to explain lexical semantics. The main property of Semantic Domains is lexical coherence, i.e. the property of domain-related words to co-occur in texts. This allows us to define automatic acquisition procedures for Domain Models from corpora, and the acquired models provide a shallow representation for lexical ambiguity and variability. Domain Models have been used to define a similarity metric among texts and terms in the Domain Space, where second-order relations are reflected. Topic similarity estimation is at the basis of text comprehension, allowing us to define a very general domain-driven methodology. The basic argument we put forward to support our approach is that the information provided by the Domain Models can be profitably used to boost the performances of supervised Natural Language Processing systems for many tasks. In fact, Semantic Domains allows us to extract domain features for texts, terms and concepts. The obtained indexing, adopted by the Domain Kernel to estimate topic similarity, preserves the original information while reducing the dimensionality of the feature space. The Domain Kernel is used to define a semi-supervised learning algorithm for Text Categorization that achieves state-of-the-art results while decreasing by one order the quantity of labeled texts required for learning. The property of the Domain Space to represent together terms and texts allows us to define an Intensional Learning schema for Text Categorization, in which categories are described by means of discriminative words instead of labeled examples, achieving performances close to human agreement. Then we investigate the role of domain information in Word Sense Disambiguation, developing both unsupervised and supervised approaches that strongly rely on the notion of Semantic Domain. The former is based on the lexical resource WordNet Domains and the latter exploits both sense tagged and unlabeled data to model the relevant domain distinctions among word senses. The proposed supervised approach improves the
5 VI Preface state-of-the-art performance in many tasks for different languages, while reducing appreciably the amount of sense tagged data required for learning. Finally, we present a lexical acquisition procedure to obtain Multilingual Domain Models from comparable corpora. We exploit such models to approach a Cross-language Text Categorization task, achieving very promising results. We would first of all acknowledge the effort of other people involved in the eight years long daily work required to produce the experimental results reported in this monograph, and in particular Claudio Giuliano, who performed most of the experimental work for the WSD experiments, allowing us to achieve very accurate results in competitions due to his patience and skills; to Bernardo Magnini, who first proposed the concept of Semantic Domain, opening the direction we have followed during our research path and supporting it with financial contributions from his projects; and to Ido Dagan, who greatly contributed to the intensional learning framework defining the experimental settings and clarifying the statistical properties of the GM algorithm. Special thanks are devoted to Oliviero Stock, for his daily encouragement and for the appreciation he has shown for our work; to Walter Daelemans, who demonstrated a real interest in the epistemological aspects of this work from the early stages; to Maurizio Matteuzzi, whose contribution was crucial to interpret the theoretical background of this work related to philosophy of language; to Roberto Basili who immediately understood the potential of Semantic Domains and creatively applied our framework for technology transfer, contributing to highlighting limitations and potentialities; and to Aldo Gangemi, who more recently helped us in clarifying the relationship of this work with formal semantics and knowledge representation. Last, but not least, we would like to thank our families and parents for having understood with patience our crazy lives, and our friends for having spent their nights in esoteric and sympathetic discussions. Trento, September 2008 Alfio Gliozzo Carlo Strapparava
6 Contents 1 Introduction Lexical Semantics and Text Understanding Semantic Domains: Computational Models for Lexical Semantics Structure of the Book Semantic Domains Domain Models Semantic Domains in Text Categorization Semantic Domains in Word Sense Disambiguation Multilingual Domain Models Kernel Methods for Natural Language Processing Semantic Domains The Theory of Semantic Fields Semantic Fields and the meaning-is-use View Semantic Domains The Domain Set WordNet Domains Lexical Coherence: A Bridge from the Lexicon to the Texts Computational Models for Semantic Domains Domain Models Domain Models: Definition The Vector Space Model The Domain Space WordNet-Based Domain Models Corpus-Based Acquisition of Domain Models Latent Semantic Analysis for Term Clustering The Domain Kernel Domain Features in Supervised Learning The Domain Kernel
7 VIII Contents 4 Semantic Domains in Text Categorization Domain Kernels for Text Categorization Semi-supervised Learning in Text Categorization Evaluation Discussion Intensional Learning Intensional Learning for Text Categorization Domain Models and the Gaussian Mixture Algorithm for Intensional Learning Evaluation Discussion Summary Semantic Domains in Word Sense Disambiguation The Word Sense Disambiguation Task The Knowledge Acquisition Bottleneck in Supervised WSD Semantic Domains in the WSD Literature Domain-Driven Disambiguation Methodology Evaluation Domain Kernels for WSD The Domain Kernel Syntagmatic Kernels WSD Kernels Evaluation Discussion Multilingual Domain Models Multilingual Domain Models: Definition Comparable Corpora Cross-language Text Categorization The Multilingual Vector Space Model The Multilingual Domain Kernel Automatic Acquisition of Multilingual Domain Models Evaluation Implementation Details Monolingual Text Categorization Results Cross-language Text Categorization Results Summary Conclusion and Perspectives for Future Research Summary Future Work Consolidation of the Present Work Domain-Driven Technologies
8 Contents IX 7.3 Conclusion A Appendix: Kernel Methods for NLP A.1 Supervised Learning A.2 Feature-Based vs. Instance-Based Learning A.3 Linear Classifiers A.4 Kernel Methods A.5 Kernel Functions A.6 Kernels for Text Processing References
Guide to Teaching Computer Science
Guide to Teaching Computer Science Orit Hazzan Tami Lapidot Noa Ragonis Guide to Teaching Computer Science An Activity-Based Approach Dr. Orit Hazzan Associate Professor Technion - Israel Institute of
More informationLecture Notes in Artificial Intelligence 4343
Lecture Notes in Artificial Intelligence 4343 Edited by J. G. Carbonell and J. Siekmann Subseries of Lecture Notes in Computer Science Christian Müller (Ed.) Speaker Classification I Fundamentals, Features,
More informationMARE Publication Series
MARE Publication Series Volume 8 Series Editors Maarten Bavinck University of Amsterdam, Amsterdam, The Netherlands Svein Jentoft Tromsø, Norway The MARE Publication Series is an initiative of the Centre
More informationInternational Series in Operations Research & Management Science
International Series in Operations Research & Management Science Volume 240 Series Editor Camille C. Price Stephen F. Austin State University, TX, USA Associate Series Editor Joe Zhu Worcester Polytechnic
More information2.1 The Theory of Semantic Fields
2 Semantic Domains In this chapter we define the concept of Semantic Domain, recently introduced in Computational Linguistics [56] and successfully exploited in NLP [29]. This notion is inspired by the
More informationPre-vocational Education in Germany and China
Pre-vocational Education in Germany and China Jun Li Pre-vocational Education in Germany and China A Comparison of Curricula and Its Implications Jun Li Tongji University, Shanghai, People s Republic of
More informationLecture Notes in Artificial Intelligence 7175
Lecture Notes in Artificial Intelligence 7175 Subseries of Lecture Notes in Computer Science LNAI Series Editors Randy Goebel University of Alberta, Edmonton, Canada Yuzuru Tanaka Hokkaido University,
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationA Comparison of Two Text Representations for Sentiment Analysis
010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational
More informationA Bayesian Learning Approach to Concept-Based Document Classification
Databases and Information Systems Group (AG5) Max-Planck-Institute for Computer Science Saarbrücken, Germany A Bayesian Learning Approach to Concept-Based Document Classification by Georgiana Ifrim Supervisors
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationMatching Similarity for Keyword-Based Clustering
Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web
More informationPerspectives of Information Systems
Perspectives of Information Systems Springer-Science+ Business Media, LLC Vesa Savolainen Editor and Main Author Perspectives of Information Systems Springer Vesa Savolainen Department of Computer Science
More informationCommunication and Cybernetics 17
Communication and Cybernetics 17 Editors: K. S. Fu W. D. Keidel W. J. M. Levelt H. Wolter Communication and Cybernetics Editors: K.S.Fu, W.D.Keidel, W.1.M.Levelt, H.Wolter Vol. Vol. 2 Vol. 3 Vol. 4 Vol.
More informationAdvances in Mathematics Education
Advances in Mathematics Education Series Editors: Gabriele Kaiser, University of Hamburg, Hamburg, Germany Bharath Sriraman, The University of Montana, Missoula, MT, USA International Editorial Board:
More informationStudying the Lexicon of Dialogue Acts
Studying the Lexicon of Dialogue Acts Nicole Novielli 1, Carlo Strapparava 2 1 Università degli Studi di Bari Dipartimento di Informatica via Orabona 4-70125 Bari, Italy novielli@di.uniba.it 2 FBK- irst,
More informationExtracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models
Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Richard Johansson and Alessandro Moschitti DISI, University of Trento Via Sommarive 14, 38123 Trento (TN),
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationAQUA: An Ontology-Driven Question Answering System
AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.
More informationThe MEANING Multilingual Central Repository
The MEANING Multilingual Central Repository J. Atserias, L. Villarejo, G. Rigau, E. Agirre, J. Carroll, B. Magnini, P. Vossen January 27, 2004 http://www.lsi.upc.es/ nlp/meaning Jordi Atserias TALP Index
More informationTarget Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data
Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationGenerative models and adversarial training
Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationMULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY
MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationTextGraphs: Graph-based algorithms for Natural Language Processing
HLT-NAACL 06 TextGraphs: Graph-based algorithms for Natural Language Processing Proceedings of the Workshop Production and Manufacturing by Omnipress Inc. 2600 Anderson Street Madison, WI 53704 c 2006
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationVocabulary Usage and Intelligibility in Learner Language
Vocabulary Usage and Intelligibility in Learner Language Emi Izumi, 1 Kiyotaka Uchimoto 1 and Hitoshi Isahara 1 1. Introduction In verbal communication, the primary purpose of which is to convey and understand
More informationAdvanced Grammar in Use
Advanced Grammar in Use A self-study reference and practice book for advanced learners of English Third Edition with answers and CD-ROM cambridge university press cambridge, new york, melbourne, madrid,
More informationWord Sense Disambiguation
Word Sense Disambiguation D. De Cao R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 May 21, 2009 Excerpt of the R. Mihalcea and T. Pedersen AAAI 2005 Tutorial, at: http://www.d.umn.edu/ tpederse/tutorials/advances-in-wsd-aaai-2005.ppt
More informationNATO ASI Series Advanced Science Institutes Series
NATO ASI Series Advanced Science Institutes Series A series presenting the results of activities sponsored by the NATO Science Committee, which aims at the dissemination of advanced scientific and technological
More informationarxiv: v2 [cs.cv] 30 Mar 2017
Domain Adaptation for Visual Applications: A Comprehensive Survey Gabriela Csurka arxiv:1702.05374v2 [cs.cv] 30 Mar 2017 Abstract The aim of this paper 1 is to give an overview of domain adaptation and
More informationExposé for a Master s Thesis
Exposé for a Master s Thesis Stefan Selent January 21, 2017 Working Title: TF Relation Mining: An Active Learning Approach Introduction The amount of scientific literature is ever increasing. Especially
More informationOntologies vs. classification systems
Ontologies vs. classification systems Bodil Nistrup Madsen Copenhagen Business School Copenhagen, Denmark bnm.isv@cbs.dk Hanne Erdman Thomsen Copenhagen Business School Copenhagen, Denmark het.isv@cbs.dk
More informationLecture 1: Basic Concepts of Machine Learning
Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010
More informationTHE PROMOTION OF SOCIAL AWARENESS
THE PROMOTION OF SOCIAL AWARENESS Powerful Lessons from the Partnership of Developmental Theory and Classroom Practice Robert L. Selman Russell Sage Foundation New York The Russell Sage Foundation The
More informationCOMMUNICATION-BASED SYSTEMS
COMMUNICATION-BASED SYSTEMS COMMUNICATION-BASED SYSTEMS Proceedings of the 3rd International Workshop held at the TU Berlin, Germany, 31 March - 1 April 2000 Edited by GÜNTER HOMMEL Technische Universität
More informationUS and Cross-National Policies, Practices, and Preparation
US and Cross-National Policies, Practices, and Preparation Studies in Educational Leadership VOLUME 12 Series Editor Kenneth A. Leithwood, OISE, University of Toronto, Canada Editorial Board Christopher
More informationMMOG Subscription Business Models: Table of Contents
DFC Intelligence DFC Intelligence Phone 858-780-9680 9320 Carmel Mountain Rd Fax 858-780-9671 Suite C www.dfcint.com San Diego, CA 92129 MMOG Subscription Business Models: Table of Contents November 2007
More informationMultilingual Sentiment and Subjectivity Analysis
Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department
More informationDeveloping a TT-MCTAG for German with an RCG-based Parser
Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationOnline Updating of Word Representations for Part-of-Speech Tagging
Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org
More informationSemantic Inference at the Lexical-Syntactic Level
Semantic Inference at the Lexical-Syntactic Level Roy Bar-Haim Department of Computer Science Ph.D. Thesis Submitted to the Senate of Bar Ilan University Ramat Gan, Israel January 2010 This work was carried
More informationWelcome to. ECML/PKDD 2004 Community meeting
Welcome to ECML/PKDD 2004 Community meeting A brief report from the program chairs Jean-Francois Boulicaut, INSA-Lyon, France Floriana Esposito, University of Bari, Italy Fosca Giannotti, ISTI-CNR, Pisa,
More information(Sub)Gradient Descent
(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include
More informationSecond Language Learning and Teaching. Series editor Mirosław Pawlak, Kalisz, Poland
Second Language Learning and Teaching Series editor Mirosław Pawlak, Kalisz, Poland About the Series The series brings together volumes dealing with different aspects of learning and teaching second and
More informationPRODUCT PLATFORM AND PRODUCT FAMILY DESIGN
PRODUCT PLATFORM AND PRODUCT FAMILY DESIGN PRODUCT PLATFORM AND PRODUCT FAMILY DESIGN Methods and Applications Edited by Timothy W. Simpson 1, Zahed Siddique 2, and Jianxin (Roger) Jiao 3 1 The Pennsylvania
More informationNatural Language Processing: Interpretation, Reasoning and Machine Learning
Natural Language Processing: Interpretation, Reasoning and Machine Learning Roberto Basili (Università di Roma, Tor Vergata) dblp: http://dblp.uni-trier.de/pers/hd/b/basili:roberto.html Google scholar:
More informationSpeech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationApplications of memory-based natural language processing
Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal
More informationCROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2
1 CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 Peter A. Chew, Brett W. Bader, Ahmed Abdelali Proceedings of the 13 th SIGKDD, 2007 Tiago Luís Outline 2 Cross-Language IR (CLIR) Latent Semantic Analysis
More informationEDUCATION IN THE INDUSTRIALISED COUNTRIES
EDUCATION IN THE INDUSTRIALISED COUNTRIES PLAN EUROPE 2000 PUBLISHED UNDER THE AUSPICES OF THE EUROPEAN CULTURAL FOUNDATION PROJECT 1 EDUCATING MAN FOR THE XXIst CENTURY Volume 5 "EDUCATION IN THE INDUSTRIALISED
More informationProceedings of the 19th COLING, , 2002.
Crosslinguistic Transfer in Automatic Verb Classication Vivian Tsang Computer Science University of Toronto vyctsang@cs.toronto.edu Suzanne Stevenson Computer Science University of Toronto suzanne@cs.toronto.edu
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationCross-Lingual Text Categorization
Cross-Lingual Text Categorization Nuria Bel 1, Cornelis H.A. Koster 2, and Marta Villegas 1 1 Grup d Investigació en Lingüística Computacional Universitat de Barcelona, 028 - Barcelona, Spain. {nuria,tona}@gilc.ub.es
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationRule discovery in Web-based educational systems using Grammar-Based Genetic Programming
Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de
More informationConstructing Parallel Corpus from Movie Subtitles
Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing
More informationDivision Strategies: Partial Quotients. Fold-Up & Practice Resource for. Students, Parents. and Teachers
t s e B s B. s Mr Division Strategies: Partial Quotients Fold-Up & Practice Resource for Students, Parents and Teachers c 213 Mrs. B s Best. All rights reserved. Purchase of this product entitles the purchaser
More informationOntological spine, localization and multilingual access
Start Ontological spine, localization and multilingual access Some reflections and a proposal New Perspectives on Subject Indexing and Classification in an International Context International Symposium
More informationSubmission of a Doctoral Thesis as a Series of Publications
Submission of a Doctoral Thesis as a Series of Publications In exceptional cases, and on approval by the Faculty Higher Degree Committee, a candidate for the degree of Doctor of Philosophy may submit a
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationAUTONOMY. in the Law
AUTONOMY in the Law Ius Gentium Comparative Perspectives on Law and Justice VOLUME 1 Series Editor Mortimer Sellers (University of Baltimore) Board of Editors Myroslava Antonovych (Kyiv-Mohyla Academy)
More informationConversational Framework for Web Search and Recommendations
Conversational Framework for Web Search and Recommendations Saurav Sahay and Ashwin Ram ssahay@cc.gatech.edu, ashwin@cc.gatech.edu College of Computing Georgia Institute of Technology Atlanta, GA Abstract.
More information2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases
POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz
More informationChapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard
Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA Alta de Waal, Jacobus Venter and Etienne Barnard Abstract Most actionable evidence is identified during the analysis phase of digital forensic investigations.
More informationUniversity of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma
University of Alberta Large-Scale Semi-Supervised Learning for Natural Language Processing by Shane Bergsma A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of
More informationAn Interactive Intelligent Language Tutor Over The Internet
An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This
More informationSwitchboard Language Model Improvement with Conversational Data from Gigaword
Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword
More informationPostprint.
http://www.diva-portal.org Postprint This is the accepted version of a paper presented at CLEF 2013 Conference and Labs of the Evaluation Forum Information Access Evaluation meets Multilinguality, Multimodality,
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationThe taming of the data:
The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data
More informationQuickStroke: An Incremental On-line Chinese Handwriting Recognition System
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents
More informationDistant Supervised Relation Extraction with Wikipedia and Freebase
Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationPrediction of Maximal Projection for Semantic Role Labeling
Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba
More informationDeveloping Language Teacher Autonomy through Action Research
Developing Language Teacher Autonomy through Action Research Helping teachers engage autonomously in action research is a very worthwhile enterprise. Beneficiaries are likely to include learners, schools
More informationThe 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X
The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,
More informationSemantic Evidence for Automatic Identification of Cognates
Semantic Evidence for Automatic Identification of Cognates Andrea Mulloni CLG, University of Wolverhampton Stafford Street Wolverhampton WV SB, United Kingdom andrea@wlv.ac.uk Viktor Pekar CLG, University
More informationConducting the Reference Interview:
Conducting the Reference Interview: A How-To-Do-It Manual for Librarians Second Edition Catherine Sheldrick Ross Kirsti Nilsen and Marie L. Radford HOW-TO-DO-IT MANUALS NUMBER 166 Neal-Schuman Publishers,
More informationAutomating Outcome Based Assessment
Automating Outcome Based Assessment Suseel K Pallapu Graduate Student Department of Computing Studies Arizona State University Polytechnic (East) 01 480 449 3861 harryk@asu.edu ABSTRACT In the last decade,
More informationWeb as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics
(L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes
More informationUsing Web Searches on Important Words to Create Background Sets for LSI Classification
Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract
More information2013/Q&PQ THE SOUTH AFRICAN QUALIFICATIONS AUTHORITY
2013/Q&PQ THE SOUTH AFRICAN QUALIFICATIONS AUTHORITY Policy and Criteria for the Registration of Qualifications and Part Qualifications on the National Qualifications Framework Compiled and produced by:
More informationAnnotation Projection for Discourse Connectives
SFB 833 / Univ. Tübingen Penn Discourse Treebank Workshop Annotation projection Basic idea: Given a bitext E/F and annotation for F, how would the annotation look for E? Examples: Word Sense Disambiguation
More informationStudents Understanding of Graphical Vector Addition in One and Two Dimensions
Eurasian J. Phys. Chem. Educ., 3(2):102-111, 2011 journal homepage: http://www.eurasianjournals.com/index.php/ejpce Students Understanding of Graphical Vector Addition in One and Two Dimensions Umporn
More informationTowards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la
Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing Grzegorz Chrupa la A dissertation submitted in fulfilment of the requirements for the award of Doctor of Philosophy (Ph.D.)
More informationLanguage Independent Passage Retrieval for Question Answering
Language Independent Passage Retrieval for Question Answering José Manuel Gómez-Soriano 1, Manuel Montes-y-Gómez 2, Emilio Sanchis-Arnal 1, Luis Villaseñor-Pineda 2, Paolo Rosso 1 1 Polytechnic University
More informationTRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY
TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY Philippe Hamel, Matthew E. P. Davies, Kazuyoshi Yoshii and Masataka Goto National Institute
More information21st CENTURY SKILLS IN 21-MINUTE LESSONS. Using Technology, Information, and Media
21st CENTURY SKILLS IN 21-MINUTE LESSONS Using Technology, Information, and Media T Copyright 2011 by Saddleback Educational Publishing. All rights reserved. No part of this book may be reproduced in any
More informationLecture Notes in Artificial Intelligence 5972
Lecture Notes in Artificial Intelligence 5972 Edited by R. Goebel, J. Siekmann, and W. Wahlster Subseries of Lecture Notes in Computer Science Norbert E. Fuchs (Ed.) Controlled Natural Language Workshop
More informationDOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME
The following resources are currently available: DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME 2016-17 What is the Doctoral School? The main purpose of the Doctoral School is to enhance your experience
More informationBeyond the Pipeline: Discrete Optimization in NLP
Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We
More informationAn Asset-Based Approach to Linguistic Diversity
Marquette University e-publications@marquette Education Faculty Research and Publications Education, College of 1-1-2007 An Asset-Based Approach to Linguistic Diversity Martin Scanlan Marquette University,
More informationExperiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling
Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad
More informationEconomics 201 Principles of Microeconomics Fall 2010 MWF 10:00 10:50am 160 Bryan Building
Economics 201 Principles of Microeconomics Fall 2010 MWF 10:00 10:50am 160 Bryan Building Professor: Dr. Michelle Sheran Office: 445 Bryan Building Phone: 256-1192 E-mail: mesheran@uncg.edu Office Hours:
More information