Topic Model Evaluation: How much does it help?
|
|
- Gordon Oliver
- 5 years ago
- Views:
Transcription
1 Topic Model Tutorial at WebSci2016 Topic Model Evaluation: How much does it help? Laura Dietz Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 1
2 Why is this important? Topic Models are computationally demanding to train Is the effort worth it? Isn t there a simpler/faster method that is as good? For multi-component systems: How much do the topics add to the total performance? How to choose K and hyper parameters? How to quantify success? Empirical Evaluation Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 2
3 Do you believe in topic models? Well, we all know that it simply works Compared to placebo? It provides pretty pictures therefore it must work I WANT TO BELIEVE Different data / task? A scientific study showed it works, no need to test it ever again. Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 3
4 Outline What is *not* an evaluation? Intrinsic evaluation Through holdout-log likelihood / perplexity With human-in-the-loop (word intrusion) Extrinsic evaluation Through classification test data Task-specific metric What to compare to? - Baselines Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 4
5 THIS IS *NOT* AN EVALUATION Banana picture licensed under CreativeCommons by-nc-sa by Viktor Hertz Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 5
6 Looking at Word clouds Which of these topics is better? Common: some correct topics, but many split or merged topics Did the author hand-pick the correct topics? Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 6
7 Looking at Highlighted Documents Which of these highlights is better? Soccer ball goal referee soccer foot coach scandal finances news team fire coach president Soccer ball goal referee soccer foot coach scandal finances news team fire coach president Humans prefer long consecutive segments with same topics Is the segment topically coherent? How about relative clauses? Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 7
8 Looking at Colored graphs Which of these graph colorations is better? Is color correlating with what we think it does? What if I told you I assigned random colors? (=placebo) Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 8
9 Danger of Human Nature We want it to work We over-interpret the story told by the visualization We corroborate a narrative that fits the results Licensed under CreativeCommons by-sa by JussiClone Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 9
10 Objective Measure Goal: Quantify what quality means! Issues No gold standard data available. Vague definition of topic. Multiple correct answers. Uncertain inference algorithms. Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 10
11 Designing a User Interface Study Claim: Topic model visualizations help users perform a task better. Run random trial evaluating humans Compare to humans that get Placebo visualizations. First study: Make sure you design the right thing Second study: Design it in the right way Do not assume your assumptions to be true! Details out of scope. Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 11
12 Intrinsic Evaluation HELD-OUT LIKELIHOOD & PERPLEXITY Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 12
13 Held-out Log Likelihood / Perplexity For the words in the test document, what is their probability under the (pre-trained) topic model? train θ, φ test Lower is better! log p θ, φ) How to get θ? Depends on doc! The lower, the better model captures patterns of natural language. Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 13
14 Variant: Document Completion For the actual words in the test document, what is their probability under the (pre-trained) topic model? train test θ, φ Lower is better! log p θ, φ) p θ ) The lower, the better model captures patterns of natural language. Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 14
15 Variations on Held-out Log Likelihood Perplexity: 2 log p Per-word measure: 1 words log p(words) Many ways to obtain, see Wallach et al Wallach, Hanna M., et al. "Evaluation methods for topic models." Proceedings of the 26th Annual International Conference on Machine Learning. ACM, Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 15
16 Topic Coherence Select words with high probability in one topic T cat lion tiger puma All word pairs: In how many documents contain both words? cat lion cat lion cat lion Topic Coherence(T) = w1,w2 log docs with both words +1 docs with word w 2 Mimno et al. "Optimizing semantic coherence in topic models." EMNLP Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 16
17 Intrinsic Evaluation HUMAN-IN-THE-LOOP Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 17
18 Word Intrusion Let a human guess which word does not fit. Human does not know the topic s word lists cat lion tiger puma apple High probability words Under topic T True intruder word = Low prob. under T High prob. under other topic Assumption: Humans guess right = Topic model is good! Model precision = correct guesses of true intruder all guesses Higher is better! Best = 1 Chang, Jonathan, et al. "Reading tea leaves: How humans interpret topic models." Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 18
19 Topic Intrusion Let human guess which topic does not fit. θ T1 = 0.5 θ T2 = 0.25 θ T5 = 0.01 Soccer ball goal referee coach scandal finances news team fire coach president T1: soccer referee goal T2: scandal finances coach guess T5: news paper stock True intruder Human does not know topic proportions Assumption: Human guesses right = Topic model good! Topic log odds = log θ(true intruder) θ(guess) = log = 3.4 Higher is better! Best = 0 Chang, Jonathan, et al. "Reading tea leaves: How humans interpret topic models." Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 20
20 Sadly: Metrics do not Always Agree better better Chang, Jonathan, et al. "Reading tea leaves: How humans interpret topic models." Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 21
21 Extrinsic Evaluation CLASSIFICATION TEST SET Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 22
22 Use Classification Dataset sports politics gossip Count how often a topic and a class match! sports politics gossip Topic Topic Topic Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 23
23 Classification: Topic Class alignment Want to compute accuracy, precision, recall, F1 How to align topics and classes? topic distribution θ multiple class labels per document Solutions: <-> sports <-> politics <-> gossip Highest agreement by KL divergence Purity: All documents in one topic vote on a class Issue: What if one topic aligns to two classes (vice versa)? Split vote proportionally Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 24
24 Pairwise Accuracy: RAND Index / BLANC measure For every pair of documents, are they: Associated with the same topic? Yes / No Associated with the same class? Yes / No Count cases of agreement or confusion table sports A B C gossip D Topic = yes Class = yes sports Class=no sports A B B D Topic = no gossip sports C D B C Compute Accuracy, Precision, Recall, F1 Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 25
25 Downsides Will indicate success only if correspondence between Unsupervised topics Supervised classes But: good/useful topics do not have to align with classes Therefore: we might get bad scores, even of the topics are good. Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 26
26 test train Supervised Classification with Topic Model Features Supervised classifiers represent each document as feature vector Use topic model as features! True classes Topic Features Predicted classes sports θ f = θ d1 (t1) θ d1 (t2) θ d1 (t3) sports sports θ f = θ d2 (t1) θ d2 (t2) θ d2 (t3) C politics politics? θ f = θ d3 (t1) θ d3 (t2) θ d3 (t3) politics If classification performance improves => topic model is good! Use k-fold cross validation Baseline? E.g. word features Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 27
27 The Problem with Topic Model Features Unfortunately, unsupervised topic model features are often outperformed by simple word-based features (e.g., TF-IDF). Example: Predicting scientific disciplines (physics, history, etc.). Rocchio: centroid vector per class; classify by cosine similarity (words versus LDA) Nanni, Glavas, Ponzetto, Dietz. Under submission. Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 28
28 Extrinsic Evaluation YOUR TASK METRIC Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 29
29 Example: Prediction of Citation Influences Example: Given citation graph with paper abstracts X= paper, Y = influence strength of citations. Gold data: Ask authors to mark strengths with ++, +, -, -- Dietz, Bickel, Scheffer. Unsupervised Prediction of Citation Influences, ICML Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 30
30 Your Task Specification Here! Task: Given input X, predict output Y Your approach: Use topic models to make a prediction Baseline approach: Use something else to make a prediction Claim: With topic model is better than without topic model X Topic Model inside! Y Base line Y Better than Y? Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 31
31 Example: Biomedical Question Answering other Might be upper bound Topic model inside Atkinson, Montecinos, Curtis. Question-driven topic-based extraction of Protein- Protein Interaction Methods from biomedical literature. Information Sciences Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 32
32 Which Baselines to Compare to? Similarity of documents Kullback-Leibler Divergence or Cosine similarity of words Word cluster: K-means on words / Agglomerative Clustering Matrix Factorization Word Embeddings (e.g. Word2Vec) Source of topics: Thesaurus / Word sense dictionary, e.g. Wordnet Topic = Wikipedia categories (or words from articles) Topic = Twitter Hashtags Topic Features in Classification: Rocchio, K-Nearest Neighbor, Support Vector Machines Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 33
33 CONCLUSION Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 34
34 References Hold-out likelihood / Perplexity Wallach, Hanna M., et al. "Evaluation methods for topic models." Proceedings of the 26th Annual International Conference on Machine Learning. ACM, Coherence measure Mimno, David, et al. "Optimizing semantic coherence in topic models." Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Human-in-the-loop evaluation Chang, Jonathan, et al. "Reading tea leaves: How humans interpret topic models." Advances in neural information processing systems Classification Data: Pairwise measures Recasens, Marta, and Eduard Hovy. "BLANC: Implementing the Rand index for coreference evaluation." Natural Language Engineering (2011): Variation of Information Meilă, Marina. "Comparing clusterings by the variation of information." Learning theory and kernel machines. Springer Berlin Heidelberg, More metrics (on related task word embeddings) Schnabel, Tobias, et al. "Evaluation methods for unsupervised word embeddings." Proc. of EMNLP Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 35
35 Conclusion User Study on visualizations Need: Experts & many humans Hold-out likelihood / Perplexity Need: only documents Human-in-the-loop: Word / Topic Intrusion Need: humans Classification performance Need: documents with class labels Please measure! Task metric Depends on the what your model is good for Thank you. Laura Dietz, Universität Mannheim Topic Model Evaluation: How much does it Slide 36
Lecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationIterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages
Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationPredicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks
Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com
More informationGeorgetown University at TREC 2017 Dynamic Domain Track
Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain
More informationA Comparison of Two Text Representations for Sentiment Analysis
010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational
More informationAustralian Journal of Basic and Applied Sciences
AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean
More informationReducing Features to Improve Bug Prediction
Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science
More informationUnsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model
Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.
More informationUsing Web Searches on Important Words to Create Background Sets for LSI Classification
Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationTraining a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski
Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski Problem Statement and Background Given a collection of 8th grade science questions, possible answer
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationSwitchboard Language Model Improvement with Conversational Data from Gigaword
Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword
More informationComment-based Multi-View Clustering of Web 2.0 Items
Comment-based Multi-View Clustering of Web 2.0 Items Xiangnan He 1 Min-Yen Kan 1 Peichu Xie 2 Xiao Chen 3 1 School of Computing, National University of Singapore 2 Department of Mathematics, National University
More informationWelcome to. ECML/PKDD 2004 Community meeting
Welcome to ECML/PKDD 2004 Community meeting A brief report from the program chairs Jean-Francois Boulicaut, INSA-Lyon, France Floriana Esposito, University of Bari, Italy Fosca Giannotti, ISTI-CNR, Pisa,
More informationCSL465/603 - Machine Learning
CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am
More informationMETHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS
METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar
More informationLanguage Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus
Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationThe stages of event extraction
The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks
More informationLearning to Rank with Selection Bias in Personal Search
Learning to Rank with Selection Bias in Personal Search Xuanhui Wang, Michael Bendersky, Donald Metzler, Marc Najork Google Inc. Mountain View, CA 94043 {xuanhui, bemike, metzler, najork}@google.com ABSTRACT
More informationRule discovery in Web-based educational systems using Grammar-Based Genetic Programming
Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de
More informationOPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS
OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,
More informationTopicFlow: Visualizing Topic Alignment of Twitter Data over Time
TopicFlow: Visualizing Topic Alignment of Twitter Data over Time Sana Malik, Alison Smith, Timothy Hawes, Panagis Papadatos, Jianyu Li, Cody Dunne, Ben Shneiderman University of Maryland, College Park,
More informationIntroduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and
More informationThe 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X
The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationarxiv: v2 [cs.cv] 30 Mar 2017
Domain Adaptation for Visual Applications: A Comprehensive Survey Gabriela Csurka arxiv:1702.05374v2 [cs.cv] 30 Mar 2017 Abstract The aim of this paper 1 is to give an overview of domain adaptation and
More informationAQUA: An Ontology-Driven Question Answering System
AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationAxiom 2013 Team Description Paper
Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association
More informationADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF
Read Online and Download Ebook ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF Click link bellow and free register to download
More informationTesting A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA
Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA Testing a Moving Target How Do We Test Machine Learning Systems? Peter Varhol, Technology
More informationLaboratorio di Intelligenza Artificiale e Robotica
Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning
More informationChapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard
Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA Alta de Waal, Jacobus Venter and Etienne Barnard Abstract Most actionable evidence is identified during the analysis phase of digital forensic investigations.
More informationSpeech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationarxiv: v2 [cs.ir] 22 Aug 2016
Exploring Deep Space: Learning Personalized Ranking in a Semantic Space arxiv:1608.00276v2 [cs.ir] 22 Aug 2016 ABSTRACT Jeroen B. P. Vuurens The Hague University of Applied Science Delft University of
More informationThe Role of String Similarity Metrics in Ontology Alignment
The Role of String Similarity Metrics in Ontology Alignment Michelle Cheatham and Pascal Hitzler August 9, 2013 1 Introduction Tim Berners-Lee originally envisioned a much different world wide web than
More informationA New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation
A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick
More informationLinking the Ohio State Assessments to NWEA MAP Growth Tests *
Linking the Ohio State Assessments to NWEA MAP Growth Tests * *As of June 2017 Measures of Academic Progress (MAP ) is known as MAP Growth. August 2016 Introduction Northwest Evaluation Association (NWEA
More informationSemantic and Context-aware Linguistic Model for Bias Detection
Semantic and Context-aware Linguistic Model for Bias Detection Sicong Kuang Brian D. Davison Lehigh University, Bethlehem PA sik211@lehigh.edu, davison@cse.lehigh.edu Abstract Prior work on bias detection
More informationFeature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes
Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes Viviana Molano 1, Carlos Cobos 1, Martha Mendoza 1, Enrique Herrera-Viedma 2, and
More informationWhy Did My Detector Do That?!
Why Did My Detector Do That?! Predicting Keystroke-Dynamics Error Rates Kevin Killourhy and Roy Maxion Dependable Systems Laboratory Computer Science Department Carnegie Mellon University 5000 Forbes Ave,
More informationBUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING
BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING Gábor Gosztolya 1, Tamás Grósz 1, László Tóth 1, David Imseng 2 1 MTA-SZTE Research Group on Artificial
More informationActive Learning. Yingyu Liang Computer Sciences 760 Fall
Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,
More informationData Integration through Clustering and Finding Statistical Relations - Validation of Approach
Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Marek Jaszuk, Teresa Mroczek, and Barbara Fryc University of Information Technology and Management, ul. Sucharskiego
More information(Sub)Gradient Descent
(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include
More informationLaboratorio di Intelligenza Artificiale e Robotica
Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning
More informationWHEN THERE IS A mismatch between the acoustic
808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,
More informationAs a high-quality international conference in the field
The New Automated IEEE INFOCOM Review Assignment System Baochun Li and Y. Thomas Hou Abstract In academic conferences, the structure of the review process has always been considered a critical aspect of
More informationClass-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification
Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,
More informationGuru: A Computer Tutor that Models Expert Human Tutors
Guru: A Computer Tutor that Models Expert Human Tutors Andrew Olney 1, Sidney D'Mello 2, Natalie Person 3, Whitney Cade 1, Patrick Hays 1, Claire Williams 1, Blair Lehman 1, and Art Graesser 1 1 University
More informationCross-Lingual Text Categorization
Cross-Lingual Text Categorization Nuria Bel 1, Cornelis H.A. Koster 2, and Marta Villegas 1 1 Grup d Investigació en Lingüística Computacional Universitat de Barcelona, 028 - Barcelona, Spain. {nuria,tona}@gilc.ub.es
More informationCROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2
1 CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 Peter A. Chew, Brett W. Bader, Ahmed Abdelali Proceedings of the 13 th SIGKDD, 2007 Tiago Luís Outline 2 Cross-Language IR (CLIR) Latent Semantic Analysis
More informationA survey of multi-view machine learning
Noname manuscript No. (will be inserted by the editor) A survey of multi-view machine learning Shiliang Sun Received: date / Accepted: date Abstract Multi-view learning or learning with multiple distinct
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationIssues in the Mining of Heart Failure Datasets
International Journal of Automation and Computing 11(2), April 2014, 162-179 DOI: 10.1007/s11633-014-0778-5 Issues in the Mining of Heart Failure Datasets Nongnuch Poolsawad 1 Lisa Moore 1 Chandrasekhar
More informationIndian Institute of Technology, Kanpur
Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar
More informationTRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY
TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY Philippe Hamel, Matthew E. P. Davies, Kazuyoshi Yoshii and Masataka Goto National Institute
More informationBeyond the Pipeline: Discrete Optimization in NLP
Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We
More informationExtracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models
Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Richard Johansson and Alessandro Moschitti DISI, University of Trento Via Sommarive 14, 38123 Trento (TN),
More informationA Bayesian Learning Approach to Concept-Based Document Classification
Databases and Information Systems Group (AG5) Max-Planck-Institute for Computer Science Saarbrücken, Germany A Bayesian Learning Approach to Concept-Based Document Classification by Georgiana Ifrim Supervisors
More informationCOBRA: A Fast and Simple Method for Active Clustering with Pairwise Constraints
COBRA: A Fast and Simple Method for Active Clustering with Pairwise Constraints Toon Van Craenendonck, Sebastijan Dumančić and Hendrik Blockeel Department of Computer Science, KU Leuven, Belgium {firstname.lastname}@kuleuven.be
More informationOnline Updating of Word Representations for Part-of-Speech Tagging
Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org
More informationAttributed Social Network Embedding
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, MAY 2017 1 Attributed Social Network Embedding arxiv:1705.04969v1 [cs.si] 14 May 2017 Lizi Liao, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua Abstract Embedding
More informationA Semantic Similarity Measure Based on Lexico-Syntactic Patterns
A Semantic Similarity Measure Based on Lexico-Syntactic Patterns Alexander Panchenko, Olga Morozova and Hubert Naets Center for Natural Language Processing (CENTAL) Université catholique de Louvain Belgium
More informationIntroduction to Causal Inference. Problem Set 1. Required Problems
Introduction to Causal Inference Problem Set 1 Professor: Teppei Yamamoto Due Friday, July 15 (at beginning of class) Only the required problems are due on the above date. The optional problems will not
More informationUnsupervised Cross-Lingual Scaling of Political Texts
Unsupervised Cross-Lingual Scaling of Political Texts Goran Glavaš and Federico Nanni and Simone Paolo Ponzetto Data and Web Science Group University of Mannheim B6, 26, DE-68159 Mannheim, Germany {goran,
More information2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases
POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz
More information11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation
tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each
More informationDiscovery of Topical Authorities in Instagram
Discovery of Topical Authorities in Instagram Aditya Pal, Amaç Herdağdelen, Sourav Chatterji, Sumit Taank, Deepayan Chakrabarti Facebook {apal,amac,sourav,staank}@fb.com, deepay@utexas.edu ABSTRACT Instagram
More informationWhat is a Mental Model?
Mental Models for Program Understanding Dr. Jonathan I. Maletic Computer Science Department Kent State University What is a Mental Model? Internal (mental) representation of a real system s behavior,
More informationA DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA
International Journal of Semantic Computing Vol. 5, No. 4 (2011) 433 462 c World Scientific Publishing Company DOI: 10.1142/S1793351X1100133X A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF
More informationClassify: by elimination Road signs
WORK IT Road signs 9-11 Level 1 Exercise 1 Aims Practise observing a series to determine the points in common and the differences: the observation criteria are: - the shape; - what the message represents.
More informationTerm Weighting based on Document Revision History
Term Weighting based on Document Revision History Sérgio Nunes, Cristina Ribeiro, and Gabriel David INESC Porto, DEI, Faculdade de Engenharia, Universidade do Porto. Rua Dr. Roberto Frias, s/n. 4200-465
More informationLip reading: Japanese vowel recognition by tracking temporal changes of lip shape
Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,
More informationArtificial Neural Networks written examination
1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14
More informationSoftware Maintenance
1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories
More informationFinding Translations in Scanned Book Collections
Finding Translations in Scanned Book Collections Ismet Zeki Yalniz Dept. of Computer Science University of Massachusetts Amherst, MA, 01003 zeki@cs.umass.edu R. Manmatha Dept. of Computer Science University
More informationOn document relevance and lexical cohesion between query terms
Information Processing and Management 42 (2006) 1230 1247 www.elsevier.com/locate/infoproman On document relevance and lexical cohesion between query terms Olga Vechtomova a, *, Murat Karamuftuoglu b,
More informationDegree Qualification Profiles Intellectual Skills
Degree Qualification Profiles Intellectual Skills Intellectual Skills: These are cross-cutting skills that should transcend disciplinary boundaries. Students need all of these Intellectual Skills to acquire
More informationAnalyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio
SCSUG Student Symposium 2016 Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio Praneth Guggilla, Tejaswi Jha, Goutam Chakraborty, Oklahoma State
More informationEvidence-based Practice: A Workshop for Training Adult Basic Education, TANF and One Stop Practitioners and Program Administrators
Evidence-based Practice: A Workshop for Training Adult Basic Education, TANF and One Stop Practitioners and Program Administrators May 2007 Developed by Cristine Smith, Beth Bingman, Lennox McLendon and
More information