Stream Mining Using Statistical Relational Learning

Similar documents
Handling Concept Drifts Using Dynamic Selection of Classifiers

Multi-label Classification via Multi-target Regression on Data Streams

Multi-label classification via multi-target regression on data streams

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Combining Proactive and Reactive Predictions for Data Streams

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CS Machine Learning

Rule Learning With Negation: Issues Regarding Effectiveness

Word Segmentation of Off-line Handwritten Documents

Lecture 1: Machine Learning Basics

Learning Methods in Multilingual Speech Recognition

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Computerized Adaptive Psychological Testing A Personalisation Perspective

Introduction to Simulation

Python Machine Learning

CSL465/603 - Machine Learning

Knowledge-Based - Systems

arxiv: v1 [cs.lg] 15 Jun 2015

Human Emotion Recognition From Speech

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Lecture 1: Basic Concepts of Machine Learning

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Mining Association Rules in Student s Assessment Data

Automatic Discretization of Actions and States in Monte-Carlo Tree Search

Rule Learning with Negation: Issues Regarding Effectiveness

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Reducing Features to Improve Bug Prediction

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

Learning Methods for Fuzzy Systems

Test Effort Estimation Using Neural Network

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Learning From the Past with Experiment Databases

AQUA: An Ontology-Driven Question Answering System

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Australian Journal of Basic and Applied Sciences

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Case Study: News Classification Based on Term Frequency

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Data Stream Processing and Analytics

Softprop: Softmax Neural Network Backpropagation Learning

A study of speaker adaptation for DNN-based speech synthesis

On-Line Data Analytics

Generative models and adversarial training

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Evolutive Neural Net Fuzzy Filtering: Basic Description

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Time series prediction

Compositional Semantics

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Mining Student Evolution Using Associative Classification and Clustering

MYCIN. The MYCIN Task

Welcome to. ECML/PKDD 2004 Community meeting

Issues in the Mining of Heart Failure Datasets

(Sub)Gradient Descent

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

A cognitive perspective on pair programming

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Artificial Neural Networks written examination

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Cooperative evolutive concept learning: an empirical study

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Laboratorio di Intelligenza Artificiale e Robotica

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

BYLINE [Heng Ji, Computer Science Department, New York University,

INPE São José dos Campos

Axiom 2013 Team Description Paper

SARDNET: A Self-Organizing Feature Map for Sequences

Using dialogue context to improve parsing performance in dialogue systems

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

An NFR Pattern Approach to Dealing with Non-Functional Requirements

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

stateorvalue to each variable in a given set. We use p(x = xjy = y) (or p(xjy) as a shorthand) to denote the probability that X = x given Y = y. We al

An OO Framework for building Intelligence and Learning properties in Software Agents

CS 446: Machine Learning

Integrating E-learning Environments with Computational Intelligence Assessment Agents

How do adults reason about their opponent? Typologies of players in a turn-taking game

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Radius STEM Readiness TM

Discriminative Learning of Beam-Search Heuristics for Planning

Probabilistic Latent Semantic Analysis

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

A Reinforcement Learning Variant for Control Scheduling

Applying Fuzzy Rule-Based System on FMEA to Assess the Risks on Project-Based Software Engineering Education

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Department of Computer Science GCU Prospectus

COMPUTER-AIDED DESIGN TOOLS THAT ADAPT

Rule-based Expert Systems

Indian Institute of Technology, Kanpur

A Case-Based Approach To Imitation Learning in Robotic Agents

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Transcription:

Stream Mining Using Statistical Relational Learning Swarup Chandra, Justin Sahs, Latifur Khan Bhavani Thuraisingham and Charu Aggarwal* Department of Computer Science, The University of Texas at Dallas *IBM T J Watson Research Center, NY, USA 12/15/2014

Introduction Streaming Data Classification Classification of data instance that occur continuously in a stream, generated by a non-stationary process. Web Search Social Media Sensors Communication

Stream Classification What are the challenges for classification in a stream? 1 Concept Drift: Data distribution changes over time. 2 Storage Limitation: Insufficient space to store all data.

Motivation Current Approach Chunk based ensemble model with adaptive learning. 1 1 Al-Khateeb, T., Masud, M. M., Khan, L., Aggarwal, C. C., Han, J., & Thuraisingham, B. M. (2012, December). Stream Classification with Recurring and Novel Class Detection Using Class-Based Ensemble. In ICDM (pp. 31-40).

Motivation Current Approach Chunk based ensemble model with adaptive learning. 1 However... They assume data to be independent and identically distributed, which may not always hold. 1 Al-Khateeb, T., Masud, M. M., Khan, L., Aggarwal, C. C., Han, J., & Thuraisingham, B. M. (2012, December). Stream Classification with Recurring and Novel Class Detection Using Class-Based Ensemble. In ICDM (pp. 31-40).

Motivation Examples Examples

Motivation Examples Web Links

Motivation Examples Source: Wikipedia

Motivation Examples Entity Name1 Entity Name2 PK attribute name1 PK attribute name4 attribute name2 attribute name2 attribute name3 Relational Database Entity Name3 attribute name5 PK attribute name6 attribute name3 attribute name7

Motivation What are we looking for?

Motivation What are we looking for? 1 Leverage existing domain knowledge and relationships to perform better classification. 2 Handle uncertainty in class distribution. 3 Overcome challenges of stream classification.

Motivation What are we looking for? 1 Leverage existing domain knowledge and relationships to perform better classification. 2 Handle uncertainty in class distribution. 3 Overcome challenges of stream classification. Overview Express domain knowledge using a language from the field of Statistical Relational Learning called Markov Logic Network.

Markov Logic Network Language that combines First Order Logic and Markov Networks. 2 MLN good for... Representing domain knowledge in FOL Relational Data Compact template structure Learning Task Learn weights associated with the FOL formulas from data. 2 Domingos, P., & Lowd, D. (2009). Markov logic: An interface layer for artificial intelligence. Synthesis Lectures on Artificial Intelligence and Machine Learning, 3(1), 1-155.

Stream Mining using MLN Challenges

Stream Mining using MLN Challenges 1 Finite Domain Size

Stream Mining using MLN Challenges 1 Finite Domain Size 2 Choice of MLN Formula Large and complex formula increases computational time. Small formula may not sufficiently capture relations.

Stream Mining using MLN Challenges 1 Finite Domain Size 2 Choice of MLN Formula Large and complex formula increases computational time. Small formula may not sufficiently capture relations. 3 Weight learning may be too slow.

Stream Mining using MLN Challenges 1 Finite Domain Size 2 Choice of MLN Formula Large and complex formula increases computational time. Small formula may not sufficiently capture relations. 3 Weight learning may be too slow. 4 Chunk size. Large size increases computational time. Small size may not capture concept drift well.

Stream Mining using MLN Addressing these challenges

Stream Mining using MLN Addressing these challenges 1 Propose a single model incremental weight learning.

Stream Mining using MLN Addressing these challenges 1 Propose a single model incremental weight learning. 2 Propose selective weight learning.

Stream Mining using MLN Addressing these challenges 1 Propose a single model incremental weight learning. 2 Propose selective weight learning. 3 Discretize the domain. 4 Limit the number of predicates in each formula. 5 Empirically estimate best chunk size.

Our Approach Basic Algorithm

Our Approach Basic Algorithm Domain Knowledge Initial MLN User Data Stream

Discretize Our Approach Basic Algorithm Data Stream Chunk1 dchunk1

Weight Learning Discretize Our Approach Basic Algorithm Data Stream Chunk1 dchunk1 Initial MLN MLN1

Discretize Inference Our Approach Basic Algorithm Data Stream Chunk2 dchunk2 MLN1 Error

Weight Learning Discretize Our Approach Basic Algorithm Data Stream Chunk2 dchunk2 MLN1 MLN2

Discretize Inference Our Approach Basic Algorithm Data Stream Chunk3 dchunk3 MLN2 Error

Weight Learning Discretize Our Approach Basic Algorithm Data Stream Chunk3 dchunk3 MLN2 MLN3

Selective Learning Do we need weight learning at every chunk?

Selective Learning Do we need weight learning at every chunk? Weight learning is expensive. Change in data distribution may not be significant.

Selective Learning Do we need weight learning at every chunk? Weight learning is expensive. Change in data distribution may not be significant. Kullback-Leibler Distance

Selective Learning Do we need weight learning at every chunk? Weight learning is expensive. Change in data distribution may not be significant. Kullback-Leibler Distance P M (z): Current chunk probability distribution of attribute a = z. P Q (z): Overall probability distribution of attribute a = z.

Selective Learning Do we need weight learning at every chunk? Weight learning is expensive. Change in data distribution may not be significant. Kullback-Leibler Distance P M (z): Current chunk probability distribution of attribute a = z. P Q (z): Overall probability distribution of attribute a = z. KL a = z D (P M(z) P Q (z)) log P M(z) P Q (z) Perform weight learning if d a is true. d a = KLprev a KL curr a KL prev a = { True : d a > Threshold False : d a Threshold, a

Our Approach Algorithm with Selective Learning

Discretize Our Approach Algorithm with Selective Learning Data Stream Chunk1 dchunk1 Calculate KL KL1 = {KLA, KLB} Current KL = KL1 Previous KL = - ChunkDist = dchunk1 OverAllDist = ChunkDist

Weight Learning Discretize Our Approach Algorithm with Selective Learning Data Stream Chunk1 dchunk1 MLN1 Initial MLN Current KL = KL1 Previous KL = - ChunkDist = dchunk1 OverAllDist = ChunkDist

Discretize Our Approach Algorithm with Selective Learning Data Stream Chunk2 dchunk2 False Calculate Distance Calculate KL KL2 = {KLA, KLB} Current KL = KL2 Previous KL = KL1 ChunkDist = dchunk2 OverAllDist = ChunkDist + OverAllDist

Discretize Our Approach Algorithm with Selective Learning Data Stream Chunk2 dchunk2 MLN1 Current KL = KL2 Previous KL = KL1 ChunkDist = dchunk2 OverAllDist = ChunkDist + OverAllDist

Discretize Our Approach Algorithm with Selective Learning Data Stream Chunk3 dchunk3 True Calculate Distance Calculate KL KL3 = {KLA, KLB} Current KL = KL3 Previous KL = KL2 ChunkDist = dchunk3 OverAllDist = ChunkDist + OverAllDist

Weight Learning Discretize Our Approach Algorithm with Selective Learning Data Stream Chunk3 dchunk3 MLN1 MLN3 Current KL = KL3 Previous KL = KL2 ChunkDist = dchunk3 OverAllDist = ChunkDist + OverAllDist

DataSets Dataset 3 Attributes Count Classification Total Discrete Real-valued Problem? ForestCover 55 45 10 forest cover type Airline 8 6 2 schedule delay Poker 11 11 0 poker hand SyntheticLED 8 8 0 digit displayed 100,000 instances in each dataset 3 http://moa.cms.waikato.ac.nz/datasets/

MLN Formulas Domain Knowledge How did we embed domain knowledge? Relationship between class and other attributes. Example: ForestCover Dataset Class attribute: Cover Type Neota (area 2) would have spruce/fir (type 1) 4 4 https://archive.ics.uci.edu/ml/datasets/covertype

MLN Formulas Domain Knowledge How did we embed domain knowledge? Relationship between class and other attributes. Example: ForestCover Dataset Class attribute: Cover Type Neota (area 2) would have spruce/fir (type 1) 4 WildernessArea(o, 2) 4 https://archive.ics.uci.edu/ml/datasets/covertype

MLN Formulas Domain Knowledge How did we embed domain knowledge? Relationship between class and other attributes. Example: ForestCover Dataset Class attribute: Cover Type Neota (area 2) would have spruce/fir (type 1) 4 WildernessArea(o, 2) CoverType(o, 1) 4 https://archive.ics.uci.edu/ml/datasets/covertype

Experiments Error Analysis Weight learning and inference using the Alchemy 5 toolkit. Compare classification error against state-of-the-art stream classifiers using the MOA 6 toolkit. 5 http://alchemy.cs.washington.edu/ 6 http://moa.cms.waikato.ac.nz/

Results Error Analysis Classifier ForestCover (500) Dataset (Chunk Size) Airline Poker (500) (750) SyntheticLED (750) MLN 13.59 35.04 8.86 25.94 HMLN 26.87 37.4 - - SluiceBox 26.9 45.2 49.9 89.9 Hoeffding Tree 13.6 40.97 48.27 28.4 NaïveBayes 22.0 41.07 50.2 27.6 Perceptron 21.4 46.43 48.27 26.8 SGD 6.6 44.13 47.87 87.73 SingleClassifierDrift 12.8 41.33 48.27 29.07 OzaBoost- Adwin 6.0 41.27 51.6 28.53 Accuracy- Updated- Ensemble2 7 7.8 35.7 48.13 27.33 7 Brzezinski, D., & Stefanowski, J. (2014). Reacting to different types of concept drift: The accuracy updated ensemble algorithm. Neural Networks and Learning Systems, IEEE Transactions on, 25(1), 81-94.

Results Error Analysis Classifier ForestCover (500) Dataset (Chunk Size) Airline Poker (500) (750) SyntheticLED (750) MLN 13.59 35.04 8.86 25.94 HMLN 26.87 37.4 - - SluiceBox 26.9 45.2 49.9 89.9 Hoeffding Tree 13.6 40.97 48.27 28.4 NaïveBayes 22.0 41.07 50.2 27.6 Perceptron 21.4 46.43 48.27 26.8 SGD 6.6 44.13 47.87 87.73 SingleClassifierDrift 12.8 41.33 48.27 29.07 OzaBoost- Adwin 6.0 41.27 51.6 28.53 Accuracy- Updated- Ensemble2 7 7.8 35.7 48.13 27.33 7 Brzezinski, D., & Stefanowski, J. (2014). Reacting to different types of concept drift: The accuracy updated ensemble algorithm. Neural Networks and Learning Systems, IEEE Transactions on, 25(1), 81-94.

Results Error Analysis Classifier ForestCover (500) Dataset (Chunk Size) Airline Poker (500) (750) SyntheticLED (750) MLN 13.59 35.04 8.86 25.94 HMLN 26.87 37.4 - - SluiceBox 26.9 45.2 49.9 89.9 Hoeffding Tree 13.6 40.97 48.27 28.4 NaïveBayes 22.0 41.07 50.2 27.6 Perceptron 21.4 46.43 48.27 26.8 SGD 6.6 44.13 47.87 87.73 SingleClassifierDrift 12.8 41.33 48.27 29.07 OzaBoost- Adwin 6.0 41.27 51.6 28.53 Accuracy- Updated- Ensemble2 7 7.8 35.7 48.13 27.33 7 Brzezinski, D., & Stefanowski, J. (2014). Reacting to different types of concept drift: The accuracy updated ensemble algorithm. Neural Networks and Learning Systems, IEEE Transactions on, 25(1), 81-94.

Results Error Analysis Classifier ForestCover (500) Dataset (Chunk Size) Airline Poker (500) (750) SyntheticLED (750) MLN 13.59 35.04 8.86 25.94 HMLN 26.87 37.4 - - SluiceBox 26.9 45.2 49.9 89.9 Hoeffding Tree 13.6 40.97 48.27 28.4 NaïveBayes 22.0 41.07 50.2 27.6 Perceptron 21.4 46.43 48.27 26.8 SGD 6.6 44.13 47.87 87.73 SingleClassifierDrift 12.8 41.33 48.27 29.07 OzaBoost- Adwin 6.0 41.27 51.6 28.53 Accuracy- Updated- Ensemble2 7 7.8 35.7 48.13 27.33 7 Brzezinski, D., & Stefanowski, J. (2014). Reacting to different types of concept drift: The accuracy updated ensemble algorithm. Neural Networks and Learning Systems, IEEE Transactions on, 25(1), 81-94.

Addressing a Limitation Major Limitation Weight learning is slow due to multiple iterations.

Addressing a Limitation Major Limitation Weight learning is slow due to multiple iterations. Dataset Without SL With Selective Learning (SL) Time (s) Error (%) Threshold (%) Time (s) Error (%) ForestCover 72.28 13.59 5 66.6 13.64 10 61.4 13.81 Airline 8.09 35.04 5 7.94 34.96 10 7.32 35.09 Poker 278.6 8.86 5 189.77 9.06 10 189.95 9.06 SyntheticLED 85.03 25.94 5 60.19 25.99 10 59.9 25.99

Conclusion Adaptation of Markov Logic Network for stream mining. Evaluate incremental learning approach. Use of domain knowledge outperforms state-of-the-art approaches.

Thank you Datasets and MLN s available at http://utdallas.edu/~swarup.chandra/