Combining Proactive and Reactive Predictions for Data Streams

Size: px

Start display at page:

Download "Combining Proactive and Reactive Predictions for Data Streams"

Gary Clark
6 years ago
Views:

1 Combining Proactive and Reactive Predictions for Data Streams Ying Yang School of Computer Science and Software Engineering, Monash University Melbourne, VIC 38, Australia Xindong Wu and Xingquan Zhu Department of Computer Science University of Vermont Burlington, VT 545, USA ABSTRACT Mining data streams is important in both science and commerce. Two major challenges are (1) the data may grow without limit so that it is difficult to retain a long history; and (2) the underlying concept of the data may change over time. Different from common practice that keeps recent raw data, this paper uses a measure of conceptual equivalence to organize the data history into a history of concepts. Along the journey of concept change, it identifies new concepts as well as re-appearing ones, and learns transition patterns among concepts to help prediction. Different from conventional methodology that passively waits until the concept changes, this paper incorporates proactive and reactive predictions. In a proactive mode, it anticipates what the new concept will be if a future concept change takes place, and prepares prediction strategies in advance. If the anticipation turns out to be correct, a proper prediction model can be launched instantly upon the concept change. If not, it promptly resorts to a reactive mode: adapting a prediction model to the new data. A system RePro is proposed to implement these new ideas. Experiments compare the system with representative existing prediction methods on various benchmark data sets that represent diversified scenarios of concept change. Empirical evidence demonstrates that the proposed methodology is an effective and efficient solution to prediction for data streams. Categories and Subject Descriptors H.2.8 [Database Management]: Database Management data mining; I.2.6 [Artificial Intelligence]: Learning General Terms Algorithms Keywords Data stream, proactive learning, conceptual equivalence 1. INTRODUCTION Prediction for data streams is important. Streaming data Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. KDD 5, August 21 24, 25, Chicago, Illinois,USA. Copyright 25 ACM X/5/8...$5.. such as transition flows in stock market are very common in the modern society. This particular paper is set in the context of classification learning. A data stream is a sequence of instances. The task is to predict every instance s class. Prediction for data streams is however not a trivial task. Streaming data have special characteristics that bring new challenges. First, the data may grow without limit and it is difficult to retain their complete history. Since one often needs to learn from the history, what information should be kept is critical. Second, the underlying concept of the data may change over time. This requires updating the prediction model accordingly and promptly. Substantial progress has been made on the problem of prediction for data streams [2, 3, 4, 5, 1, 12, 13]. However, a number of problems remain open. First, previous approaches have not well organized nor made good use of the history of data streams. Two typical methodologies exist. One is to keep a recent history of raw instances since the whole history is too much to retain. As to be explained later, it may be applicable to concept drift but loses valuable information in the context of concept shift. Another methodology, upon detecting a concept change, discards the history corresponding to the outdated concept and accumulates instances to learn the new concept from scratch. As to be demonstrated later, it ignores useful experience and is suboptimal if a previous concept is re-established. A second open problem is that existing approaches are mainly interested in predicting the class of each specific instance. No significant effort has been devoted to foreseeing a bigger picture, that is, what the new concept will be if a concept change takes place. This prediction, if possible, is at a more general level and is proactive. It will help prepare for the future and can instantly launch a new prediction strategy upon detecting a concept change. For example, people have learned over time the concept change among seasons, such as winter comes after autumn. As a result, people prepare for the winter during the autumn, which is of great help in many aspects of life. However, the problem of predicting the oncoming concept can be far more complicated than the example. The transition among concepts can be probabilistic rather than deterministic. Brand new concepts can show up, of which the history has no information. This paper sets out to tackle those open problems. The goals involve (1) organizing the history of raw data into a history of compact concepts by identifying new concepts as well as re-appearing historical ones; (2) learning patterns of concept transition from the concept history; and (3) carrying 71

2 out effective and efficient prediction at two levels, a general level of predicting each oncoming concept and a specific level of predicting each instance s class. 2. TERMINOLOGY A data stream is a sequence of instances. Each instance is a vector of attribute values with a class label. If its class is known, an instance is a labelled instance. Otherwise it is unlabelled. Predicting for an instance is to decide the class of an unlabelled instance by evidence from labelled instances. A concept is represented by the learning results of a classification algorithm, such as a classification rule set or a Bayesian probability table. Concept change has three modes. Concept drift is gradual [9, 11, 13] while concept shift is more abrupt [6, 9]. Sampling change [8, 13] is a change of data distribution that leads to revising the current model even when the concept remains the same. One may assume for concept drift that the current concept is the most similar to the oncoming concept because of its continuity [12]. Concept shift however does not hold this assumption. For instance, suppose a democrat government succeeds a republican government. To predict what national policies will take place, it is better to check what happened under previous democrat governments in history, rather than check what happened under the last republican one although it was the most recent. 3. BUILDING CONCEPT HISTORY This history is built along the journey of concept change by identifying new concepts as well as re-appearing ones. First, it retains essential information of a long history where raw data are prohibitive to keep. Second, it stores previous concepts so that they can be reused if needed, which is more efficient than learning from scratch. Third, the transitions among concepts can be learned therefrom, which helps proactively predict what the oncoming concept is. 3.1 Four Key components A classification algorithm is used to abstract a concept from the raw data. A user may choose any classification algorithm of his/her interest, or choose one that appears to be good at learning the current data. In this particular paper, C4.5rules [7] is employed since it is commonly used and can achieve a reasonable classification accuracy in general. A trigger detection algorithm finds instances, across which the underlying concept has changed and the prediction model should be modified. It is especially important when concept shifts. Keogh and Tsymbal et al. [4, 11] have reviewed numerous trigger detection algorithms. A slidingwindow methodology is used here with two parameters window size and error threshold. The beginning of the window is always a misclassified instance. When the window is full and its error rate exceeds the error threshold, the beginning instance is taken as a trigger; otherwise, the beginning of the window is slid to the next misclassified instance (if there is any) and previous instances are dropped from the window. A conceptual equivalence measure checks whether a concept is historical or brand new. In Table 1, it calculates the equivalence degree between a newly-learned concept V and each historical concept T by comparing their verdicts on each available instance. If the degree is above a user-defined threshold, V is a reappearing T. Otherwise, V is a new concept. It is more proper than the conceptual equivalence measure used in the previous work [14] that relies on the classification accuracies of V and T on D. For example, the accuracy and the degree of conceptual equivalence are not necessarily positively correlated. Although V and T classify D with low accuracies, their equivalence degree can be very high if they agree on classifying many instances even when their classifications are both wrong. Table 1: Measuring conceptual equivalence Input: concept V newly learned from current data D, and historical concept T. Output: ce [-1,1], the bigger the value, the higher the degree of conceptual equivalence between V and T. Begin ce = ; FOREACH instance I D result1, result2 = respectively classify I by V, T; IF (result1!= result2) score=-1; ELSE IF (result1 == nomatch) score=; ELSE score=1; ce += score; return (ce = ce / D ); A stable learning size specifies the number of instances above which the learned concept can be deemed stable. In order to be sensitive to concept change, the trigger-window size is normally small and is sometimes insufficient to induce the adequate concept. If one has to learn a concept out of these few instances, the concept is not stable (such as of high classification variance) and should be re-learned once a stable-learning-size of labelled instances are accumulated. In practice, the system keeps two concept histories: H retaining stable concepts and triggers; and H retaining possibly unstable concepts and triggers. Assume the stable learning size is s and the instance index for the last stable trigger is j. Each concept C t learned before j+s is stored in H. When classes of instances across [j,j + s) are known, a single concept C k is learned therefrom. If C ts average classification accuracy is no better than C k s, C k is deemed as a stable concept and is stored into H; and H is updated by H. Otherwise, C ts are deemed as a series of stable concepts and H is updated by H. Although H may be temporarily used to predict each instance s class, H is always used to learn transition matrices and to predict the oncoming concept. 3.2 The building process Integrating all the above key components, the process of building a concept history is illustrated in Table 2. For demonstration s sake, assume that the window size is 1, the stable learning size is 3 and the trigger error threshold is 55%. A represents an instance where a stable trigger is detected. A represents an instance where a temporary trigger is detected. A represents a correctly classified instance. A represents a wrongly classified instance. Only instances coming after the last stable trigger and up to now are retained for the purpose of calculating conceptual equivalence. Other historical instances are not essential to keep. 4. BUILDING PREDICTION MODELS As in Stage 3 of Table 2, upon trigger detection one should change the model to classify oncoming instances. Available information is (1) a history of concepts up to this trigger; and (2) a window size of labelled instances, which contributed to detecting this new trigger. Depending on when to start preparing this new model, prediction can be proactive or reactive. Each approach has its own pros and cons. A system RePro is proposed to combine their strengths. 711

3 Table 2: The process of building concept history Stage Initially, one can randomly guess each instance's class. When a stable-learning-size of instances have known classes, the first concept can be learned and stored in the stable history. t-4 t t+9 Stable trigger Full window Stage 1 Starting from t, errors are witnessed. Up to t+9, the window is full and the error rate is 8% which is above the error threshold. Hence, t is detected as a new trigger. t-4 t t+9 Learn C k Stage 2 Since there are no less than 3 instances between t-4 (last stable trigger) and t, the concept C k learned from [t-4, t) is stable. Use conceptual equivalence (as in Table 1) to check whether C k is a historical concept or a new one. Update both stable and contemporary concept histories. t-4 t t+18 Learn C t Temporary trigger Stage 3 A new prediction model is chosen for instances coming after t+9. Assume a new trigger is detected at t+18. A concept C t learned over [t, t+18) can be unstable since it is learned from too few instances. Store it only into the temporary concept history. t-4 t t+18 t+29 Stable learning size, learn C k+1 Stage 4 When time arrives at t+29, 3 instances are known. If no (temporary) trigger happens in [t, t+29], do nothing. Otherwise, learn a concept C k+1 from [t, t+29]. If C k+1 s accuracy is competitive to temporary C t s, store C k+1 into stable history and update temporary history by stable history. Otherwise update stable history by temporary history. 4.1 Proactive mode A proactive mode predicts the forthcoming concept given the current one by evidence from the concept history. It is ahead of a trigger detection and is independent of the trigger window. Once a new trigger is detected, the predicted concept immediately takes over the classification task. The advantages are the quick response and stable prediction model. The concept history is treated as a Markov Chain where each distinct concept is a state. A Markov chain is a model commonly used in information retrieval. It involves a sequence of states where the probability of the current state depends on the past state(s). If the probability only depends on the immediate past state, it is a first-order Markov chain. Otherwise it is a higher-order Markov chain. A transition matrix can be built and updated from the chain over time, recording the frequency or probability of each alternative concept following a candidate concept. For example, a sequence of arriving weather concepts is spring, summer, autumn, winter, spring, summer, hurricane, autumn, winter, spring, flood, summer, autumn, winter, spring, summer, autumn, winter, spring, summer, hurricane, autumn,... Table 3 shows an accumulated first-order transition matrix. Whenever a new concept comes, a new row and a new column are added in the table. Transition patterns among concepts can be learned from Table 3: Transition Matrix Next State Current State spring summer autumn winter hurricane flood... spring 4 1 summer 3 2 autumn 4 winter 4 hurricane 2 flood 1... the transition matrix, such as the most likely concept after autumn is winter. However, problems can happen when the anticipation is less confident, such as no state has a dominant probability given the current state. In this case, one may turn to higher-order transition matrices. The higher order the matrix to maintain, the more expensive the system becomes. Or, one can use a reactive approach to break ties, which is the topic of the following section. 4.2 Reactive mode A reactive mode waits until the concept has changed to construct a prediction model depending on the trigger instances. It can be either contemporary or historical. Upon detecting a trigger, the contemporary-reactive mode does not consult the concept history. Instead, it trains a prediction model simply from the trigger instances to classify oncoming instances. Although straightforward, this approach risks high classification variance when the sliding window is small. For example, C4.5rules can almost always form a set of rules with a high classification accuracy across a window of instances. The rules, however, are very likely to misclassify oncoming instances because of its poor generalization strength. As a result, another new trigger is soon detected even when the data are from the same concept. Nonetheless, it can be a measure of last resort if the concept is brand new and has no history to learn from. Upon detecting a trigger, the historical-reactive mode retrieves a concept from the history that is most appropriate for the trigger instances. It tests each distinct historical concept s classification accuracy across the trigger window. The one with the highest accuracy is chosen as the new prediction model. One merit of consulting the concept history is that each concept accepted by the history is a stable classifier. Hence it can avoid the contemporary mode s problem. However, there are also potential disadvantages. One problem happens when the new concept is very different from existing ones. Consequently, the history offers a low classification accuracy on the window. Another potential concern is its efficiency. If there are many distinct concepts in history, it may take a while to test every single one. 4.3 RePro, a coalition of strength A system RePro (reactive plus proactive), as in Table 4, incorporates proactive and reactive predictions, using one s strength to offset the other s weakness. As a result, if the proactive component of RePro foresees multiple probable concepts given the current concept, one can use the historical-reactive component as a tie breaker. The other way around, if there are many historical concepts, the proactive component can offer a short list to the historical-reactive component and speed up the process. If both the proactive and historical-reactive modelling incur low accuracy in the new trigger window, it indicates that the new concept is very different from historical ones. Hence, a contemporaryreactive classifier can help to cope with the emergency. 712

4 Nonetheless, as explained before, in order to avoid the high classification variance problem, when a stable-learning-size of labelled instances are accumulated, one should update the contemporary-reactive concepts by a stable concept. Table 4: RePro: reactive plus proactive Input: A list of distinct historical concepts C DIS, a concept sequence C SEQ, a transition matrix C TRA, a window of trigger instances I W IN, a probability threshold threshold prob, a classification accuracy threshold threshold accu. Output: Concept c next for oncoming instances. Begin c last = the last stable concept in C SEQ; // Proactive c next(s) = concept(s) whose probability given c last is bigger than threshold prob, according to C TRA; IF multiple c nexts exist // Break tie by historical-reactive FOREACH c next calculate its accuracy on I WIN; c next = the one acquiring the highest accuracy; // Historical-reactive IF no c next exists FOREACH concept c historical C DIS calculate its accuracy on I WIN; c next = the one acquiring highest accuracy; // Contemporary-reactive IF accuracy of c next on I WIN is less than threshold accu c next = concept learned from I WIN; return (c next); 5. RELATED WORK Many approaches have been published that predict in concept-changing scenarios. Three typical mechanisms are discussed here. More comprehensive reviews can be found in various informative survey papers [4, 11]. The FLORA system [13] relates to RePro in that it stores old concepts and reuses them if appropriate. However, it is oriented to data of small sizes instead of streaming data [3]. It represents concepts by conjunctions of attribute values and measures the conceptual equivalence by a syntactical comparison. This is less applicable in streaming data where learned rules can be easily of a large number and of protean appearances. Besides, FLORA stores the concepts only for a reactive purpose. It does not explore their associations and hence can not be proactive. The concept-adapting very fast decision tree (CVFDT) [3] is one of the best-known systems that achieves efficient classification in streaming data. It represents a popular methodology that continuously adapts the prediction model to coming instances. CVFDT relates to RePro in that it detects triggers. However, upon finding a concept change, CVFDT builds a new prediction model from scratch. It resembles the contemporary-reactive modelling. In cases where history repeats itself, CVFDT can not take advantage of previous experience and hence may be less efficient. Between the gap where the old model is outdated and the new model has not matured, the prediction accuracy may suffer. The weighted classifier ensemble (WCE) [12] represents a family of algorithms that ensemble learners for prediction. To classify a coming instance, WCE divides historical raw data into sequential chunks of fixed size, builds a classifier from each chunk, and composes an ensemble where each classifier is weighted proportional to its classification accuracy on the chunk most recent to the instance to be classified. WCE relates to RePro in that it uses a history of concepts (classifiers). However, the quality of this history is controlled by an arbitrary chunk size k. There is no trigger detection nor conceptual equivalence. As a result, the following two scenarios can witness sub-optimal prediction accuracies. Scenario 1: in the most recent data chunk C 1 that is the gauge to weigh classifiers in the ensemble, the majority of instances are not from the new concept. As illustrated below, an instance of concept Ψ is to be classified. Because the majority of instances in C 1 are of concept Ω, classifiers suitable for Ω (such as C 1 and C 2) will be ironically given a higher priority than those suitable for Ψ (such as C 4). Past To be classified Ψ Ψ Ψ Ψ Ψ Ψ Ψ Ψ Ψ Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ψ Ψ Ψ Ψ Ψ Ψ C 4 C 3 C 2 C 1 Learned classifier Scenario 2: the previous concept dominates the retained raw data. As illustrated below, when classifying an instance of concept Ψ, most classifiers are still indulged in the previous concept Ω. As a result, even when the gauge chunk C 1 favors instances of Ψ, its correct weighing can be overshadowed by the large number of improper classifiers (a single high-weight C 1 against many low-weight C 2, C 3 and C 4). Past To be classified Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ω Ψ Ψ Ψ Ψ Ψ Ψ Ψ Ψ Ψ C 4 C 3 C 2 C 1 Learned classifier 6. EXPERIMENTS Experiments are conducted to verify whether RePro outperforms WCE and CVFDT 1 on predicting for data streams in various types of concept-changing scenarios. 6.1 Data Benchmark data are employed to cover concept drift, concept shift and sampling change. When applicable, instances are randomly generated with time as seed. Hence, the reappearance of concepts unnecessarily accompany the reappearance of instances, which better simulates the real world. Hyperplane can simulate concept drift by its continuous moving [3, 5, 1, 11, 12]. A hyperplane in a d-dimensional space [, 1] d is denoted by P d i=1 wixi = w, where each vector of variables < x 1,..., x d > is a randomly generated instance P and is uniformly distributed. Instances satisfying d i=1 wixi w are labelled as positive, and otherwise negative. The value of each coefficient w i is continuously changed so that the hyperplane is gradually drifting in the space. P Besides, the value of w is always set as 1 d 2 i=1 wi so that roughly half of the instances are positive, and the other half are negative. Particularly in this experiment, w i s changing range is [-3,+3] and its changing rate is.5 per instance. Stagger can simulate concept shift [5, 9, 11, 13]. There are three alternative underlying concepts, A: if color = red size = small, class= positive; otherwise, class= negative; B: if color = green shape = circle, class= positive; otherwise, class= negative; and C: if size = medium large, class= positive; otherwise, class= negative. Particularly in this experiment, 5 instances are randomly generated ac- 1 FLORA is not involved since it is not oriented to handling streaming data s large amount. 713

5 cording to each concept. Besides, one can control the transition among concepts from stochastic to deterministic by tuning the z parameter of Zipfian distribution, which has been detailed in Appendix A of [15]. In the remainder of the paper, unless otherwise mentioned, z is set as 1, simulating the most common case in the real world where the transitions among concepts are probabilistic (neither deterministic nor totally random). Network intrusion can simulate sampling change [1]. It was used for the 3rd International KDD Tools Competition. It includes a wide variety of intrusions simulated in a military network environment. The task is to build a prediction model capable of distinguishing between normal connections (Class 1) and network intrusions (Classes 2,3,4,5). Different periods witness bursts of different intrusion classes. Assuming all data are simultaneously available, learners like C4.5rules can achieve a high classification accuracy on the whole data set. Hence one may think that there is only a single concept underlying the data. However, in the context of streaming data, a learner s observation is limited since instances come one by one. 6.2 Rival methods The original implementation of WCE specifies a parameter, chunk size s. It does not update the ensemble until another s instances have come and been labelled. Readers may be curious about its performance under different frequencies of updating the ensemble. Hence, a more dynamic version, dynamic WCE (DWCE), is also implemented here. DWCE specifies an additional parameter, buffer size f (f s). Once f new instances are labelled, DWCE repartitions the data into chunks of size s, and retrains and reensembles classifiers. According to empirical evidence as detailed in Appendix B of [15], when it is sufficient to avoid high classification variance, the smaller the chunk size, the lower error WCE gets. With the same chunk size, DWCE with a smaller buffer size can outperform WCE in accuracy. WCE and DWCE with parameter settings that have produced the lowest error rates are taken to compare with RePro. As for CVFDT, the software published by its authors is used ( There are abundant parameters. Various parameter settings are tested and the best results are taken to compare with RePro. 6.3 Results and analysis The prediction error and time of each method on each data set are illustrated in Figure 1. The incremental results along the time line are depicted in Figures 2, 3 and 4. Generally, RePro achieves lower error rates than all other methods in all data sets except for DWCE in Hyperplane, in which case RePro is far more efficient than DWCE. As a matter of fact, DWCE s time overhead is prohibitively high, making it almost unfeasible for predicting in streaming data. Specifically in Stagger, the concept shifts among A, B and C every 5 instances. RePro s prediction error rate increases upon each concept change and decreases later when a proper prediction model is constructed. With time going by and the concept history growing longer, RePro learns better the transitions among concepts, adjusts faster to the concept change and achieves the lowest error rate. The second best method is CVFDT. Compared with RePro, CVFDT always learns a concept from scratch upon trigger detection no matter whether this concept is a reappearance of an old Error rate Time (sec.) RePro WCE DWCE CVFDT Stagger Hyperplane Intrusion Stagger Hyperplane Intrusion Figure 1: Comparing efficacy and efficiency. one. This rebuilding needs time and incurs high classification variance before the classifier becomes stable. Hence it has a slower adaption to the concept change and incurs more prediction errors than RePro. Although DWCE is better as expected than WCE in terms of prediction accuracy, both are trigger-insensitive and produce highest error rates in the context of concept shift. Error rate ( WCE RePro DWCE Time stamp CVFDT Figure 2: Incremental results on Stagger In Hyperplane, the concept drifts up and down. This is the niche for WCE and DWCE because the most recent concept is always the most similar to the new one. Nonetheless, RePro offers a surprisingly competitive accuracy. An insight into RePro reveals that the conceptual equivalence measure distinguishes all concepts into two equivalence groups: one with w i [ 3, ] and the other with w i (,+3]. Otherwise the hyperplanes are too close to be worth differentiating. As a result, a concept sequence of A, B, A, B, is learned. To classify an instance of a latter B, RePro employs a former B. This is no less effective than classifying an instance by its recent instances when the concept drifts. As for CVFDT, it incurs the highest error rate. The reason is that the concept of P d i=1 wixi w is not an easy one to learn and CVFDT does not learn from the history. CVFDT needs to wait long before it builds a new stable classifier from scratch, during which period many prediction mistakes have been made. This is compounded by the dilemma that the concept may soon change again after this (relatively long) waiting period 714

6 and the newly-stabled classifier has to be discarded then. Error rate WCE CVFDT DWCE RePro Time stamp Figure 3: Incremental results on Hyperplane In Network Intrusion, transitions among different intrusion types are more random than deterministic. Hence, the reactive component of RePro often takes over the prediction task. Because an intrusion type sometimes re-appears itself in the data stream, RePro is able to reuse historical concepts and achieve the lowest prediction error. CVFDT acquires a competitive performance because a intrusion type is much easier to learn than a Hyperplane even if the learning starts from scratch. Again WCE and DWCE witness suboptimal results because there does not necessarily exist continuality among intrusion types. Hence the recent concepts are not always the most appropriate to use. Error rate ( WCE DWCE CVFDT RePro Time stamp Figure 4: Incremental results on Network Intrusion In summary, RePro excels in scenarios of concept shift and sampling change. It also offers a competitive accuracy when the concept drifts. In all cases, RePro is very efficient. 7. CONCLUSION Human beings live in a changing world whose history may repeat itself. Hence, both foreseeing into the future and retrospecting into the history are important. This paper has proposed a novel mechanism to organize, into a concept history, data that stream along the time line. As a result, the problem of the intractable amount of streaming data is solved since concepts are much more compact than raw data while still retain the essential information. Besides, patterns among concept transitions can be learned from this history, which helps foresee the future. This paper has also proposed a novel system RePro to predict for the concept-changing data streams. Not only can RePro conduct a reactive prediction: detecting the concept change and accordingly modifying the prediction model for oncoming instances; but also RePro can be proactive: predicting the coming concept given the current concept. By making good use of the concept history, and incorporating reactive and proactive modelling, RePro is able to achieve both effective and efficient predictions in various scenarios of concept change, including concept shift, concept drift and sampling change. Although different types of changes have different characteristics and require different optimal solutions, a challenge in reality is that one seldom knows beforehand what type of change is happening. Hence a mechanism like RePro that performs well in general is very useful. 8. REFERENCES [1] C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In Proc. of 29th VLDB, pages 81 92, 23. [2] V. Ganti, J. Gehrke, and R. Ramakrishnan. Demon: Mining and monitoring evolving data. IEEE TKDE, 13(1):5 63, 21. [3] G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In Proc. of 7th SIGKDD, pages 97 16, 21. [4] E. Keogh and S. Kasetty. On the need for time series data mining benchmarks: a survey and empirical demonstration. In Proc. of 8th SIGKDD, pages , 22. [5] J. Z. Kolter and M. A. Maloof. Dynamic weighted majority: A new ensemble method for tracking concept drift. In Proc. of 3rd ICDM, pages , 23. [6] C. Lanquillon and I. Renz. Adaptive information filtering: Detecting changes in text streams. In Proc. of 8th CIKM, pages , [7] J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, [8] M. Salganicoff. Tolerating concept and sampling shift in lazy learning using prediction error context switching. Artificial Intelligence Review, 11(1-5): , February [9] K. O. Stanley. Learning concept drift with a committee of decision trees, 23. Technical Report AI-3-32, Department of Computer Sciences, University of Texas at Austin, USA. [1] W. N. Street and Y. Kim. A streaming ensemble algorithm (sea) for large-scale classification. In Proc. of 7th SIGKDD, pages , 21. [11] A. Tsymbal. The problem of concept drift: definitions and related work, 24. Technical Report TCD-CS-24-15, Computer Science Department, Trinity College Dublin, Ireland. [12] H. Wang, W. Fan, P. S. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In Proc. of 9th SIGKDD, pages , 23. [13] G. Widmer and M. Kubat. Learning in the presence of concept drift and hidden contexts. Machine Learning, 23(1):69 11, [14] Y. Yang, X. Wu, and X. Zhu. Dealing with predictive-but-unpredictable attributes in noisy data sources. In Proc. of 8th PKDD, pages , 24. [15] Y. Yang, X. Wu, and X. Zhu. Proactive-reactive prediction for data streams, 25. Technical Report CS-5-3, Department of Computer Sciences, University of Vermont, USA. 715

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United