Stream Mining Using Statistical Relational Learning

Size: px

Start display at page:

Download "Stream Mining Using Statistical Relational Learning"

Thomasine Shanna Taylor
6 years ago
Views:

1 Stream Mining Using Statistical Relational Learning Swarup Chandra, Justin Sahs, Latifur Khan Bhavani Thuraisingham and Charu Aggarwal* Department of Computer Science, The University of Texas at Dallas *IBM T J Watson Research Center, NY, USA 12/15/2014

2 Introduction Streaming Data Classification Classification of data instance that occur continuously in a stream, generated by a non-stationary process. Web Search Social Media Sensors Communication

3 Stream Classification What are the challenges for classification in a stream? 1 Concept Drift: Data distribution changes over time. 2 Storage Limitation: Insufficient space to store all data.

4 Motivation Current Approach Chunk based ensemble model with adaptive learning. 1 1 Al-Khateeb, T., Masud, M. M., Khan, L., Aggarwal, C. C., Han, J., & Thuraisingham, B. M. (2012, December). Stream Classification with Recurring and Novel Class Detection Using Class-Based Ensemble. In ICDM (pp ).

5 Motivation Current Approach Chunk based ensemble model with adaptive learning. 1 However... They assume data to be independent and identically distributed, which may not always hold. 1 Al-Khateeb, T., Masud, M. M., Khan, L., Aggarwal, C. C., Han, J., & Thuraisingham, B. M. (2012, December). Stream Classification with Recurring and Novel Class Detection Using Class-Based Ensemble. In ICDM (pp ).

6 Motivation Examples Examples

7 Motivation Examples Web Links

8 Motivation Examples Source: Wikipedia

9 Motivation Examples Entity Name1 Entity Name2 PK attribute name1 PK attribute name4 attribute name2 attribute name2 attribute name3 Relational Database Entity Name3 attribute name5 PK attribute name6 attribute name3 attribute name7

10 Motivation What are we looking for?

11 Motivation What are we looking for? 1 Leverage existing domain knowledge and relationships to perform better classification. 2 Handle uncertainty in class distribution. 3 Overcome challenges of stream classification.

12 Motivation What are we looking for? 1 Leverage existing domain knowledge and relationships to perform better classification. 2 Handle uncertainty in class distribution. 3 Overcome challenges of stream classification. Overview Express domain knowledge using a language from the field of Statistical Relational Learning called Markov Logic Network.

13 Markov Logic Network Language that combines First Order Logic and Markov Networks. 2 MLN good for... Representing domain knowledge in FOL Relational Data Compact template structure Learning Task Learn weights associated with the FOL formulas from data. 2 Domingos, P., & Lowd, D. (2009). Markov logic: An interface layer for artificial intelligence. Synthesis Lectures on Artificial Intelligence and Machine Learning, 3(1),

14 Stream Mining using MLN Challenges

15 Stream Mining using MLN Challenges 1 Finite Domain Size

16 Stream Mining using MLN Challenges 1 Finite Domain Size 2 Choice of MLN Formula Large and complex formula increases computational time. Small formula may not sufficiently capture relations.

17 Stream Mining using MLN Challenges 1 Finite Domain Size 2 Choice of MLN Formula Large and complex formula increases computational time. Small formula may not sufficiently capture relations. 3 Weight learning may be too slow.

18 Stream Mining using MLN Challenges 1 Finite Domain Size 2 Choice of MLN Formula Large and complex formula increases computational time. Small formula may not sufficiently capture relations. 3 Weight learning may be too slow. 4 Chunk size. Large size increases computational time. Small size may not capture concept drift well.

19 Stream Mining using MLN Addressing these challenges

20 Stream Mining using MLN Addressing these challenges 1 Propose a single model incremental weight learning.

21 Stream Mining using MLN Addressing these challenges 1 Propose a single model incremental weight learning. 2 Propose selective weight learning.

22 Stream Mining using MLN Addressing these challenges 1 Propose a single model incremental weight learning. 2 Propose selective weight learning. 3 Discretize the domain. 4 Limit the number of predicates in each formula. 5 Empirically estimate best chunk size.

23 Our Approach Basic Algorithm

24 Our Approach Basic Algorithm Domain Knowledge Initial MLN User Data Stream

25 Discretize Our Approach Basic Algorithm Data Stream Chunk1 dchunk1

26 Weight Learning Discretize Our Approach Basic Algorithm Data Stream Chunk1 dchunk1 Initial MLN MLN1

27 Discretize Inference Our Approach Basic Algorithm Data Stream Chunk2 dchunk2 MLN1 Error

28 Weight Learning Discretize Our Approach Basic Algorithm Data Stream Chunk2 dchunk2 MLN1 MLN2

29 Discretize Inference Our Approach Basic Algorithm Data Stream Chunk3 dchunk3 MLN2 Error

30 Weight Learning Discretize Our Approach Basic Algorithm Data Stream Chunk3 dchunk3 MLN2 MLN3

31 Selective Learning Do we need weight learning at every chunk?

32 Selective Learning Do we need weight learning at every chunk? Weight learning is expensive. Change in data distribution may not be significant.

33 Selective Learning Do we need weight learning at every chunk? Weight learning is expensive. Change in data distribution may not be significant. Kullback-Leibler Distance

34 Selective Learning Do we need weight learning at every chunk? Weight learning is expensive. Change in data distribution may not be significant. Kullback-Leibler Distance P M (z): Current chunk probability distribution of attribute a = z. P Q (z): Overall probability distribution of attribute a = z.

35 Selective Learning Do we need weight learning at every chunk? Weight learning is expensive. Change in data distribution may not be significant. Kullback-Leibler Distance P M (z): Current chunk probability distribution of attribute a = z. P Q (z): Overall probability distribution of attribute a = z. KL a = z D (P M(z) P Q (z)) log P M(z) P Q (z) Perform weight learning if d a is true. d a = KLprev a KL curr a KL prev a = { True : d a > Threshold False : d a Threshold, a

36 Our Approach Algorithm with Selective Learning

37 Discretize Our Approach Algorithm with Selective Learning Data Stream Chunk1 dchunk1 Calculate KL KL1 = {KLA, KLB} Current KL = KL1 Previous KL = - ChunkDist = dchunk1 OverAllDist = ChunkDist

38 Weight Learning Discretize Our Approach Algorithm with Selective Learning Data Stream Chunk1 dchunk1 MLN1 Initial MLN Current KL = KL1 Previous KL = - ChunkDist = dchunk1 OverAllDist = ChunkDist

39 Discretize Our Approach Algorithm with Selective Learning Data Stream Chunk2 dchunk2 False Calculate Distance Calculate KL KL2 = {KLA, KLB} Current KL = KL2 Previous KL = KL1 ChunkDist = dchunk2 OverAllDist = ChunkDist + OverAllDist

40 Discretize Our Approach Algorithm with Selective Learning Data Stream Chunk2 dchunk2 MLN1 Current KL = KL2 Previous KL = KL1 ChunkDist = dchunk2 OverAllDist = ChunkDist + OverAllDist

41 Discretize Our Approach Algorithm with Selective Learning Data Stream Chunk3 dchunk3 True Calculate Distance Calculate KL KL3 = {KLA, KLB} Current KL = KL3 Previous KL = KL2 ChunkDist = dchunk3 OverAllDist = ChunkDist + OverAllDist

42 Weight Learning Discretize Our Approach Algorithm with Selective Learning Data Stream Chunk3 dchunk3 MLN1 MLN3 Current KL = KL3 Previous KL = KL2 ChunkDist = dchunk3 OverAllDist = ChunkDist + OverAllDist

43 DataSets Dataset 3 Attributes Count Classification Total Discrete Real-valued Problem? ForestCover forest cover type Airline schedule delay Poker poker hand SyntheticLED digit displayed 100,000 instances in each dataset 3

44 MLN Formulas Domain Knowledge How did we embed domain knowledge? Relationship between class and other attributes. Example: ForestCover Dataset Class attribute: Cover Type Neota (area 2) would have spruce/fir (type 1) 4 4

45 MLN Formulas Domain Knowledge How did we embed domain knowledge? Relationship between class and other attributes. Example: ForestCover Dataset Class attribute: Cover Type Neota (area 2) would have spruce/fir (type 1) 4 WildernessArea(o, 2) 4

46 MLN Formulas Domain Knowledge How did we embed domain knowledge? Relationship between class and other attributes. Example: ForestCover Dataset Class attribute: Cover Type Neota (area 2) would have spruce/fir (type 1) 4 WildernessArea(o, 2) CoverType(o, 1) 4

47 Experiments Error Analysis Weight learning and inference using the Alchemy 5 toolkit. Compare classification error against state-of-the-art stream classifiers using the MOA 6 toolkit

48 Results Error Analysis Classifier ForestCover (500) Dataset (Chunk Size) Airline Poker (500) (750) SyntheticLED (750) MLN HMLN SluiceBox Hoeffding Tree NaïveBayes Perceptron SGD SingleClassifierDrift OzaBoost- Adwin Accuracy- Updated- Ensemble Brzezinski, D., & Stefanowski, J. (2014). Reacting to different types of concept drift: The accuracy updated ensemble algorithm. Neural Networks and Learning Systems, IEEE Transactions on, 25(1),

49 Results Error Analysis Classifier ForestCover (500) Dataset (Chunk Size) Airline Poker (500) (750) SyntheticLED (750) MLN HMLN SluiceBox Hoeffding Tree NaïveBayes Perceptron SGD SingleClassifierDrift OzaBoost- Adwin Accuracy- Updated- Ensemble Brzezinski, D., & Stefanowski, J. (2014). Reacting to different types of concept drift: The accuracy updated ensemble algorithm. Neural Networks and Learning Systems, IEEE Transactions on, 25(1),

50 Results Error Analysis Classifier ForestCover (500) Dataset (Chunk Size) Airline Poker (500) (750) SyntheticLED (750) MLN HMLN SluiceBox Hoeffding Tree NaïveBayes Perceptron SGD SingleClassifierDrift OzaBoost- Adwin Accuracy- Updated- Ensemble Brzezinski, D., & Stefanowski, J. (2014). Reacting to different types of concept drift: The accuracy updated ensemble algorithm. Neural Networks and Learning Systems, IEEE Transactions on, 25(1),

51 Results Error Analysis Classifier ForestCover (500) Dataset (Chunk Size) Airline Poker (500) (750) SyntheticLED (750) MLN HMLN SluiceBox Hoeffding Tree NaïveBayes Perceptron SGD SingleClassifierDrift OzaBoost- Adwin Accuracy- Updated- Ensemble Brzezinski, D., & Stefanowski, J. (2014). Reacting to different types of concept drift: The accuracy updated ensemble algorithm. Neural Networks and Learning Systems, IEEE Transactions on, 25(1),

52 Addressing a Limitation Major Limitation Weight learning is slow due to multiple iterations.

53 Addressing a Limitation Major Limitation Weight learning is slow due to multiple iterations. Dataset Without SL With Selective Learning (SL) Time (s) Error (%) Threshold (%) Time (s) Error (%) ForestCover Airline Poker SyntheticLED

54 Conclusion Adaptation of Markov Logic Network for stream mining. Evaluate incremental learning approach. Use of domain knowledge outperforms state-of-the-art approaches.

55 Thank you Datasets and MLN s available at

Handling Concept Drifts Using Dynamic Selection of Classifiers

Handling Concept Drifts Using Dynamic Selection of Classifiers Paulo R. Lisboa de Almeida, Luiz S. Oliveira, Alceu de Souza Britto Jr. and and Robert Sabourin Universidade Federal do Paraná, DInf, Curitiba,