Learning Naïve Bayes Tree for Conditional Probability Estimation

Size: px
Start display at page:

Download "Learning Naïve Bayes Tree for Conditional Probability Estimation"

Transcription

1 Learning Naïve Bayes Tree for Conditional Probability Estimation Han Liang 1, Yuhong Yan 2 1 Faculty of Computer Science, University of New Brunswick Fredericton, NB, Canada E3B 5A3 2 National Research Council of Canada Fredericton, NB, Canada E3B 5X9 yuhong.yan@nrc.gc.ca Abstract. Naïve Bayes Tree uses decision tree as the general structure and deploys naïve Bayesian classifiers at leaves. The intuition behind it is that naïve Bayesian classifiers work better than decision trees when the sample data set is small. Therefore, after several attribute splits when constructing a decision tree, it is better to use naïve Bayesian classifiers at the leaves than to continue splitting the attributes. Naïve Bayes Tree is used to improve classification accuracy and Area Under the Curve (AUC). In this paper, we propose a learning algorithm to improve the conditional probability estimation in the diagram of Naïve Bayes Tree. The motivation for this work is that, for cost-sensitive learning where costs are associated with conditional probabilities, the score function is optimized when the estimates of conditional probabilities are accurate. The learning algorithm is a greedy and recursive process where in each step, the conditional log likelihood (CLL) is used as the metric to expand the decision tree. When some bound conditions are met, the algorithm uses naïve Bayes to estimate the probabilities for leaf attributes given the class variable and the path attributes. The whole tree encodes the conditional probability in its structure and parameters. The additional benefit of accurate estimates of conditional probability is that both the classification accuracy and AUC could be improved. On a large suite of benchmark sample sets, our experiments show that the CLL tree (CLL- Tree) outperforms the state-of-art learning algorithms, such as Naïve Bayes Tree and naïve Bayes significantly in yielding accurate conditional probability estimation and improving classification accuracy and AUC. 1 Introduction Classification is a fundamental issue of machine learning in which a classifier is induced from a set of labeled training samples represented by a vector of attribute values and a class label. We denote attribute set A = {A 1, A 2,..., A n }, and an assignment of value to each attribute in A by a corresponding bold-face lowercase letter a. We use C to denote the class variable and c to denote its value. This work was done when the author was a visiting worker at Institute for Information Technology, National Research Council of Canada.

2 2 Han Liang and Yuhong Yan Thus, a training sample E = (a, c), where a = (a 1, a 2,..., a n ), and a i is the value of attribute A i. A classifier is a function f that maps a sample E to a class label c, i.e. f(a) = c. The inductive learning algorithm returns a function h that approximates f. The function h is called a hypothesis. The classifier can predict the assignment of C for an unlabeled testing sample E t = (b), i.e. h(b) = c t. Various inductive learning algorithms, such as decision trees, Bayesian networks, and neural networks, can be categorized into two major approaches: probability-based approach and decision boundary-based approach. In a generative probability learning algorithm, a probability distribution p(a, C) is learned from the training samples as a hypothesis. Then we can theoretically compute the probability of any E in the probability space. A testing sample E t = (b) is classified into the class c with the maximum posterior class probability p(c b) (or simply class probability), as shown below. h(b) = arg max c C p(c b) = arg max p(c, b)/p(b) (1) c C Decision tree learning algorithms are well known as decision boundary-based. Though their probability estimates are poor, the algorithms can make good decisions on which side of the boundary a sample data falls. Decision trees work better when the sample data set is large. It is because, after several splits of attributes, the number of samples at the subspaces is too few on which to base the decision, while naïve Bayesian classifier works better in this case. Therefore, instead of continuing to split the attributes, naïve Bayesian classifiers are deployed at the leaves. [5] proposed this hybrid model called Naïve Bayes Tree (NBTree). It is reported that NBTree outperforms C4.5 and naïve Bayes in classification accuracy and AUC. In this paper, we try to benefit from both decision tree and probability models in that we propose to use NBTree to improve the conditional probability estimation given the support attributes, i.e. p(c A). Accurate conditional probability is important in many aspects. First, in cost-sensitive classification, knowing the accurate conditional probability is crucial in making a decision. Determining only the decision boundary is not enough. Second, improving conditional probability can possibly improve classification accuracy, though it is not a necessary condition. Third, improving conditional probability can improve AUC which is a metric used for ranking. Our proposed learning algorithm is a greedy and recursive procedure similar to NBTree. In each step of expanding the decision tree, the Conditional Log Likelihood (CLL) is used as the score function to select the best attribute to split, where let the CLL of a classifier B, given a (sub) sample set S be CLL(B S) = n log P B (C A) (2) s=1 The splitting process ends when some conditions are met. Then for the samples at leaves, naïve Bayesian classifiers are generated. This kind of NBTrees optimizes the estimation of conditional probability. We call the generated tree

3 Learning Naïve Bayes Tree for Conditional Probability Estimation 3 CLL Tree (CLLTree). We present that on a large suite of benchmark sample sets, our empirical results show that CLLTree significantly outperforms the state-ofart learning algorithms, such as NBTree and naïve Bayes in yielding accurate probability estimation, classification accuracy and AUC. This paper is organized as follows: section 2 reviews the existing work of using decision tree to estimate probability; section 3 presents the principles of CLLTree; section 4 presents the learning algorithm; section 5 collects the experimental results and section 6 is the conclusion. 2 Related Work in Decision Tree Based Probability Estimation Decision tree learning algorithms are a major type of effective learning algorithm in machine learning. In a decision boundary-based algorithm, an explicit decision boundary is extracted from the training samples, and a testing sample E is classified into class c if E falls into the decision area corresponding to c. However, traditional decision tree algorithms, such as C4.5 [10], have been observed to produce poor probability estimation [9]. Typically, the probability generated from decision tree is calculated from the sub sample sets at leaves corresponding to the conjunction of the conditions along the paths back to the root [8]. For example, if a leaf node defines a subset of 100 samples, 90 of which are in the positive class and others are in the negative class, then each sample is assigned the same probability of 0.9 (90/100) that it belongs to the positive class. More specifically, ˆp(+ A p = a p ) equals 90%, where A p is the set of attributes on the path. Viewed as probability estimators, decision trees consist of piecewise uniform approximations within regions defined by axis-paralleled boundaries. Aiming at this fact, Provost and Domingos [8] presented two methods to improve the probability estimation of decision tree. First, by using Laplace estimation, probability estimates can be smoothed from small sample data at the tree leaves. Second, by turning off pruning and collapsing in C4.5, decision trees can generate finer trees to give more precise probability estimation. The final version is called C4.4. In this paper, we compare our new algorithm with C4.4 and its variants. Another alternative improvement to tackle the uniform probability distribution problem of decision trees is to stop splitting at a certain level and put another probability density estimator at each leaf. [5] proposed an NBTree that uses decision tree as the general structure and deploys naïve Bayes classifiers at the leaves. This learning algorithm first uses classification accuracy as the score function to do univariate splits and when splitting does not increase the score function, a naïve Bayesian classifier is created at the leaf. Thus, sample attributes are divided into two sets: A = A p A l, where A p is the set of path attributes and A l is the set of leaf attributes. [5] showed the improvement of classification accuracy but it did not mention the performance on probability estimation. [13] proposed one encode of p(c, A) for NBTree. The proposed Conditional Independent Tree (CITree) denotes p(a, C) as below:

4 4 Han Liang and Yuhong Yan p(a, C) = αp(c A p (L))p(A l (L) A p (L), C) (3) where α is a normalization factor. The term p(c A p (L)) is the joint conditional distribution of path attributes and the term p(a l (L) A p (L), C) is the leaf attributes presented by naïve Bayes. From the conditional independence assumption of naïve Bayes, the following equation stands: p(a l A p (L), C) = n p(a li A p (L), C) (4) i=1 CITree explicitly defines conditional dependence among the path attributes and independence among the leaf attributes. The local conditional independence assumption of CITree is a relaxation of the (global) conditional independence assumption of naïve Bayes. Further study in [11] reveals that the local conditional independence explains the replication problem. And one CITree can be decomposed into a set of trees to eliminate the replicated subtrees. The complexity of the sum of the trees is not greater than the original one. Building decision trees with accurate probability estimation, called Probability Estimation Trees (PETs), has received a great deal of attention recently [8]. The difference of PET and CITree is that PET represents the conditional probability distribution of the path attributes, while a CITree represents a joint distribution over all attributes. Another related work involves Bayesian networks [7]. Bayesian networks are directed acyclic graphs that encode conditional independence among a set of random variables. Each variable is independent of its non-descendants in the graph given the state of its parents. Tree Augmented Naïve Bayes (TAN), proposed by [3], approximates the interaction between attributes by using a tree structure imposed on the naïve Bayesian framework. We need to point out that, although TAN takes advantage of tree structure, it is not similar to a decision tree. Indeed, decision trees divide a sample space into multiple subspaces and local conditional probabilities are independent among those subspaces. Therefore, attributes in decision trees can repeatedly appear, while TAN describes the joint probabilities among attributes, so each attribute appears only once. In decision trees, p(c A) is decomposable when a (sub) sample set is split into subspaces, but it is non-decomposable in TAN. 3 Learning Naïve Bayes Tree for Conditional Probability Estimation 3.1 The performance evaluation metrics Accurate conditional probability p(c A) is important for many applications. Since it is justified that log p is a monotonic function of p and we use conditional log likelihood (CLL) for calculation, we mix the usage of CLL and conditional

5 Learning Naïve Bayes Tree for Conditional Probability Estimation 5 probability hereafter. In cost-sensitive classification, the optimal prediction for a sample b is the class c i that minimize [2] h(b) = arg min p(c j b)c(c i, c j ) (5) c i C c j C c i One can see that the score function in cost-sensitive learning directly relies on the conditional probability. It is not like the classification problem where only the decision boundary is important. Accurate estimation of conditional probability is necessary for cost-sensitive learning. Better conditional probability estimation means better classification accuracy (ACC) (c.f. Equation 1). ACC is calculated as the percentage of the correctly classified samples over all the samples: ACC = 1 N I(h(a) = c), here N is the number of samples. However, improving conditional probability estimation is not a necessary condition for improving ACC. ACC can be scaled up through other ways, e.g. boundary-based approaches. On the other side, even if conditional probability is greatly improved, it may still lead to wrong classification. For instance, presume that E+ is a positive sample while it is misclassified into the negative class with a class probability estimate ˆp(+ E+) = 0.3. If the classification threshold is 0.5, an algorithm that improves this class probability estimate to ˆp(+ E+) = 0.4, still gives the incorrect result. Therefore, in this case, CLL does not improve ACC. However, at least, more precise estimation of CLL would not worsen ACC. Ranking is different from both classification and probability estimation. For example, assume that E+ and E are a positive and a negative sample respectively, and that the actual class probabilities are p(+ E+) = 0.9 and p( E ) = 0.1. An algorithm that gives class probability estimates ˆp(+ E+) = 0.55 and ˆp(+ E ) = 0.54, gives a correct order of E+ and E in the ranking. Notice that the probability estimates are poor and the classification for E is incorrect. However, if a learning algorithm produces accurate class probability estimates, it certainly produces a precise ranking. Thus, aiming at learning a model to yield accurate conditional probability estimation will usually lead to a model yielding precise probability-based ranking. In this paper, we use three different metrics CLL, ACC and AUC to evaluate learning algorithms. 3.2 The representation of CLL in CLLTree The representation of conditional probability in the diagram of CLLTree is as follows: log(p(c A)) = log(p(c A l, A p ) = log(p(c A p ))+log(p(a l C, A p )) log(p(a l A p )) (6) A p divides a (sub) sample set into several subsets. All decomposed terms are the conditional probability of A p. p(c A p ) is the conditional probability on the path attributes; p(a l C, A p ) is the naïve Bayesian classifier at a leaf; and p(a l A p ) is the joint probability of A l under condition of A p.

6 6 Han Liang and Yuhong Yan In each step of generating the decision tree, CLL is calculated based on Equation 6. Assuming A li denotes a leaf attribute, here, p(c A p ) is calculated by the ratio of the number of samples that have the same class value to all the samples at a leaf; p(a l C, A p ) can be represented by m i=1 p(a li C, A p ) (m is the number of attributes at a leaf node), and each p(a li C, A p ) can be calculated by the ratio of the number of samples that have the same attribute value of A li and the same class value to the number of samples that have the same class value; likewise, p(a l A p ) can also be represented by m i=1 p(a li A p ), and each p(a li A p ) can be calculated by the ratio of the number of samples that have the same attribute value of A li to the number of samples at that leaf. The attribute to optimize CLL is selected as the next level node to extend the tree. We exhaustively build all possible trees in each step and keep only the best on for the next level expansion. Supposing finite k attributes are available. When expanding the tree at level q, there are k q + 1 attributes to be chosen. This is a greedy way. CLLTree makes an assumption on probability, i.e. the probability dependency on the path attributes and the probability independency on the leaf attributes. Besides, it also has another assumption on the structure that each node has only one parent. 4 A New Algorithm for Learning CLLTree From the discussion in the previous sections, CLLTree can represent any joint distribution. Therefore, the probability estimation based on CLLTree is accurate. But the structure learning of a CLLTree could theoretically be as time-consuming as learning an optimal decision tree. A good approximation of a CLLTree, which gives relatively accurate estimates of class probabilities, is desirable in many applications. Similar to a decision tree, building a CLLTree could be a greedy and recursive process. On each iteration, choose the best attribute as the root of the (sub) tree, split the associated data into disjoint subsets corresponding to the values of that attribute, and recur this process for each subset until certain criteria are satisfied. If the structure of a CLLTree is determined, a leaf naïve Bayes is a perfect model to represent the local conditional distribution at leaves. The algorithm is described below. In our algorithm, we adopt a heuristic search process in which we choose an attribute with the greatest improvement on the performance of the resulting tree. Precisely speaking, on each iteration, each candidate attribute is chosen as the root of the (sub) tree, the resulting tree is evaluated, and we choose the attribute that achieves the highest CLL value. We consider two criteria for halting the search process. For one, we could stop splitting when none of the alternative attributes significantly improve probability estimation, in the form of CLL. Or, to make a leaf naïve Bayes work accurately, there are at least 30 samples at the current leaf. We define a split to be significant if the relative reduction in CLL is greater than 5%. Note that we train a leaf naïve Bayes by adopting an inner 5-fold cross-validation on the sub sample set S which fall into the current leaf. For example, if an attribute has 3 attribute values which will

7 Learning Naïve Bayes Tree for Conditional Probability Estimation 7 Algorithm 1 Learning Algorithm CLLT ree(t, E, A) T : CLLTree S: a set of labeled samples A: a set of attributes for each attribute a A do Partition S into S 1, S 2,..., S k, where k is the number of possible values of attribute a. Each sub set is corresponding to a value of a. For continuous attributes, a threshold is set up in this step. Create a naïve Bayes for each S i. Evaluate the split on the attribute a in terms of CLL. Choose the attribute A t with the highest split CLL. if the split CLL is not improved greatly than the CLL of attribute A t then create a leaf naïve Bayes for this attribute. else for all values S a of A t do CLLT ree(t a, S a, A A t ). add T a as a child of T Return T result in three leaf naïve Bayes, the inner 5-fold cross-validations will be run in three leaves. Furthermore, we compute CLL by putting the samples from all the leaves together rather than computing the CLL for each leaf separately. It is also worth noting, however, the different biases between learning a CLL- Tree and learning a traditional decision tree. In decision tree, the building process is directed by the purity of the (sub) sample set measured by information gain, and the crucial point in selecting an attribute is whether the resulting split of the samples is pure or not. However, such a selection strategy does not necessarily lead to the truth of improving the probability estimation of a new sample. In building a CLLTree, we intend to choose the attributes that maximize the posterior class probabilities p(c A) among the samples at the current leaf as much as possible. That means, even though there possibly exists the high impurity of its leaves, it could still be a good CLLTree. Therefore, traditional decision tree learning algorithms are not straightforwardly suitable for learning CLLTree. 5 Experimental Methodology and Results For the purpose of our study, we used 33 well-recognized sample sets from many domains recommended by Weka [12]. There is a brief description of these sample sets in Table 1. All sample sets came from the UCI repository[1]. The preprocessing stages of sample sets were carried out within the Weka platform, mainly including the following three steps: 1. Applying the filter of ReplaceMissingValues in Weka to replace the missing values of attributes.

8 8 Han Liang and Yuhong Yan Table 1. Description of sample sets used by the experiments. Data Set Size Attr. Classes Missing Numeric anneal Y Y anneal.orig Y Y audiology Y N balance N Y breast Y N breast-w Y N colic Y Y colic.orig Y Y credit-a Y Y credit-g N Y diabetes N Y glass N Y heart-c Y Y heart-h Y Y heart-s N Y hepatitis Y Y hypoth Y Y iris N Y kr-vs-kp N N labor Y Y letter N Y lymph N Y mushroom Y N p.-tumor Y N segment N Y sick Y Y soybean Y N splice N N vehicle N Y vote Y N vowel N Y waveform N Y zoo N Y 2. Applying the filter of Discretize in Weka to discretize numeric attributes. Therefore, all the attributes are treated as nominal. 3. It is well known that, if the number of values of an attribute is almost equal to the number of samples in a sample set, this attribute does not contribute any information to classification. So we used the filter of Remove in Weka to delete these attributes. Three occurred within the 33 sample sets, namely Hospital Number in sample set horse-colic.orig, Instance Name in sample set Splice and Animal in sample set zoo. To avoid the zero-frequency problem, we used the Laplace estimation. More precisely, assuming that there are n c samples that have the class label as c, t total samples, and k class values in a sample set. The frequency-based probability estimation calculates the estimated probability by p(c) = nc t. The Laplace estimation calculates it as p(c) = nc+1 t+k. In the Laplace estimation, p(a i c) is calculated by p(a i c) = nic+1 n c+v i, where v i is the number of values of attribute A i and n ic is the number of samples in class c with A i = a i. In our experiments, two groups of comparisons have been performed. We compared CLLTree with naïve Bayesian related algorithms, such as NBTree, NB, TAN; and with PETs variant algorithms, such as C4.4,C4.4-B(C4.4 with bag-

9 Learning Naïve Bayes Tree for Conditional Probability Estimation 9 ging), C4.5-L(C4.5 with Laplace estimation) and C4.5-B(C4.5 with bagging). We implemented CLLTree within the Weka framework [12], and used the implementation of other learning algorithms in Weka. In all experiments, the experimental result for each algorithm was measured via a ten-fold cross validation. Runs with various algorithms were carried out on the same training sets and evaluated on the same test sets. In particular, the cross-validation folds were the same for all the experiments on each sample set. Finally, we conducted two-tailed t-test with a significantly different probability of 0.95 to compare our algorithm with others. That is, we speak of two results for a sample set as being significantly different only if the difference is statistically significant at the 0.05 level according to the corrected two-tailed t-test [6]. Table 2 and 4 show the experimental results in terms of CLL and AUC. The corresponding summaries of t-test results are demonstrated in Table 3 and 5. Multi-class AUC has been calculated by M-measure[4] in our experiments. Table 6 and 7 are used to display the ACC comparison and t-test results respectively. In all t-test tables, each entry w/t/l means that the algorithm in the corresponding row wins in w sample sets, ties in t sample sets, and loses in l sample sets, in contrast with the algorithm in the corresponding column. Now, our observations are summarized as follows. 1. CLLTree outperforms NBTree in terms of CLL and AUC significantly, and slightly better in ACC. The detailed results in CLL (Table 3) show that CLLTree wins in 10 sample sets, ties in 23 sample sets and loses in 0 sample sets. In AUC (Table 5), CLLTree wins in 5 sample sets, ties in 27 sample sets and loses only in one. In addition, CLLTree surpasses NBTree in the ACC performance as well. It wins in 3 sample sets and loses in 1 sample set. 2. CLLTree is the best among the rest of learning algorithms in AUC. Compared with C4.4, it wins in 19 sample sets, ties in 14 sample sets and loses in 0 sample sets. Since C4.4 is the state-of-art decision tree algorithm designed specifically for yielding accurate ranking, this comparison also provides evidence to support CLLTree. Compared with naïve Bayes, our algorithm also wins in 9 sample sets, ties in 21 sample sets and loses in 3 sample sets. 3. In terms of the average classification accuracy (Table 6), CLLTree achieves the highest ACC among all algorithms. Compared with naïve Bayes, it wins in 11 sample sets, ties in 21 sample sets and loses in 1 sample set. The average ACC for naïve Bayes is 82.82%, lower than that of CLLTree. Furthermore, CLLTree also outperforms TAN significantly. It wins 6 sample sets, ties in 24 sample sets and loses in 3 sample sets. The average ACC for TAN is 84.64%, which is lower than our algorithm as well. And last, CLLTree is also better than C4.5, the implementation of traditional decision trees, in 8 sample sets. 4. Although C4.4 outperforms CLLTree in CLL, our algorithm is definitely better than C4.4 in the overall performance. C4.4 sacrifices its tree size to improve probability estimation, which could produce the overfitting problem and will be noise sensitive. Therefore, in a practical perspective, CLLTree is more suitable for many real applications.

10 10 Han Liang and Yuhong Yan From our experiments, we also made one interesting observation that in terms of training time CLLTree is tremendously better than NBTree. It wins in 30 sample sets, ties in 2 sample sets and loses in 1 sample set. We did not present the experimental results in this paper due to space limitation. 6 Conclusion In this paper, we have proposed a novel algorithm CLLTree to improve probability estimation in NBTree. The empirical results prove our expectation that CLL and AUC are significantly improved and ACC is slightly better compared to other classic learning algorithms. There is still room to improve probability estimation. For example, after the structure is learned, we can use parameter learning algorithms to tune the conditional probability estimates on the path attributes. And we can find the right tree size for our model, i.e. possibly using model-selection criteria to decide when to stop the splitting. References 1. C. Blake and C.J. Merz. Uci repository of machine learning database. 2. Charles Elkan. The foundations of cost-sensitive learning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, N. Friedman, D. Geiger, and M. Goldszmidt. Bayesian network classifiers. Machine Learning, 29, D. J. Hand and R. J. Till. A simple generalisation of the area under the roc curve for multiple class classification problems. Machine Learning, 45, Ron Kohavi. Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, C. Nadeau and Y. Bengio. Inference for the generalization error. Machine Learning, 52(40), J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, F. J. Provost and P. Domingos. Tree induction for probability-based ranking. Machine Learning, 52(30), F. J. Provost, T. Fawcett, and R. Kohavi. The case against accuracy estimation for comparing induction algorithms. In Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann, J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann: San Mateo, CA, J. Su and H. Zhang. Representing conditional independence using decision trees. In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI- 05). AAAI Press, I. H. Witten and E. Frank. Data Mining Practical Machine Learning Tools and Techniques with Java Implementation. Morgan Kaufmann, H. Zhang and J. Su. Conditional independence trees. In Proceedings of the 15th European Conference on Machine Learning (ECML2004). Springer, 2004.

11 Learning Naïve Bayes Tree for Conditional Probability Estimation 11 Table 2. Experimental results for CLLTree versus Naïve Bayes Tree (NBTree), naïve Bayes (NB) and Tree Augmented Naïve Bayes (TAN); C4.4, C4.4 with bagging (C4.4-B) and C4.5 with Laplace estimation (C4.5-L): Conditional Log Likelihood (CLL) & standard deviation. Data Set CLLTree NBTree NB TAN C4.4 C4.4-B anneal ± ± ± ± ± ±1.78 anneal.orig ± ± ± ± ± ±4.44 audiology ± ± ± ± ± ±3.02 balance-scale ± ± ± ± ± ±2.46 breast-cancer ± ± ± ± ± ±2.13 breast-w ± ± ± ± ± ±2.50 colic ± ± ± ± ± ±3.36 colic.orig ± ± ± ± ± ±2.26 credit-a ± ± ± ± ± ±4.14 credit-g ± ± ± ± ± ±3.87 diabetes ± ± ± ± ± ±4.08 glass ± ± ± ± ± ±1.98 heart-c ± ± ± ± ± ±2.79 heart-h ± ± ± ± ± ±3.09 heart-statlog ± ± ± ± ± ±2.15 hepatitis -9.38± ± ± ± ± ±1.64 hypothyroid ± ± ± ± ± ±5.70 iris -2.73± ± ± ± ± ±1.23 kr-vs-kp ± ± ± ± ± ±2.75 labor -1.50± ± ± ± ± ±0.95 letter ± ± ± ± ± ±40.84 lymph -9.16± ± ± ± ± ±1.85 mushroom 0.00± ± ± ± ± ±0.20 primary-tumor ± ± ± ± ± ±3.45 segment ± ± ± ± ± ±6.49 sick ± ± ± ± ± ±4.67 soybean ± ± ± ± ± ±4.69 splice ± ± ± ± ± ±6.91 vehicle ± ± ± ± ± ±3.09 vote -7.78± ± ± ± ± ±3.25 vowel ± ± ± ± ± ±4.88 waveform ± ± ± ± ± ±9.38 zoo -2.14± ± ± ± ± ±1.00, statistically significant degradation or improvement compared with CLLTree Table 3. Summary on t-test of experimental results: CLL comparisons on CLL- Tree, NBTree, NB, TAN, C4.4 and C4.4-B. An entry w/t/l means that the algorithm at the corresponding row wins in w sample sets, ties in t sample sets, and loses in l sample sets, compared to the algorithm at the corresponding column. C4.4-B C4.4 TAN NB NBTree C4.4 19/7/7 TAN 16/12/5 8/17/8 NB 14/9/10 5/14/14 3/18/12 NBTree 10/15/8 5/16/12 2/20/11 7/22/4 CLLTree 14/13/6 6/19/8 5/23/5 11/21/1 10/23/0

12 12 Han Liang and Yuhong Yan Table 4. Experimental results for CLLTree versus Naïve Bayes Tree (NBTree), naïve Bayes (NB) and Tree Augmented Naïve Bayes (TAN); C4.4 and C4.4 with bagging (C4.4-B): Area Under the Curve (AUC) & standard deviation. Data Set CLLTree NBTree NB TAN C4.4 C4.4-B anneal 95.97± ± ± ± ± ±5.78 anneal.orig 93.73± ± ± ± ± ±7.91 audiology 70.36± ± ± ± ± ±1.84 balance-scale 84.69± ± ± ± ± ±6.03 breast-cancer 68.00± ± ± ± ± ±10.49 breast-w 98.64± ± ± ± ± ±1.01 colic 82.08± ± ± ± ± ±6.63 colic.orig 81.95± ± ± ± ± ±5.38 credit-a 92.06± ± ± ± ± ±3.59 credit-g 79.14± ± ± ± ± ±4.74 diabetes 82.57± ± ± ± ± ±5.45 glass 82.17± ± ± ± ± ±7.12 heart-c 83.89± ± ± ± ± ±0.64 heart-h 83.87± ± ± ± ± ±0.62 heart-statlog 91.34± ± ± ± ± ±6.67 hepatitis 83.48± ± ± ± ± ±13.79 hypothyroid 88.23± ± ± ± ± ±7.56 iris 98.72± ± ± ± ± ±2.09 kr-vs-kp 99.82± ± ± ± ± ±0.05 labor 95.29± ± ± ± ± ±15.51 letter 99.36± ± ± ± ± ±0.21 lymph 89.12± ± ± ± ± ±3.11 mushroom ± ± ± ± ± ±0.00 primary-tumor 75.33± ± ± ± ± ±3.07 segment 99.40± ± ± ± ± ±0.28 sick 98.44± ± ± ± ± ±0.46 soybean 99.81± ± ± ± ± ±1.26 splice 99.45± ± ± ± ± ±0.58 vehicle 86.68± ± ± ± ± ±2.16 vote 98.50± ± ± ± ± ±2.18 vowel 99.35± ± ± ± ± ±1.58 waveform ± ± ± ± ± ±1.28 zoo 88.64± ± ± ± ± ±6.71 average 89.83± ± ± ± ± ±4.11, statistically significant degradation or improvement compared with CLLTree Table 5. Summary on t-test of experimental results: AUC comparisons on CLL- Tree, NBTree, NB, TAN, C4.4 and C4.4-B. C4.4-B C4.4 TAN NB NBTree C4.4 0/15/18 TAN 12/19/2 18/13/2 NB 14/12/7 21/7/5 4/20/9 NBTree 8/20/5 19/12/2 3/25/5 7/25/1 CLLTree 14/16/3 19/14/0 6/25/2 9/21/3 5/27/1

13 Learning Naïve Bayes Tree for Conditional Probability Estimation 13 Table 6. Experimental results for CLLTree versus Naïve Bayes Tree (NBTree), naïve Bayes (NB) and Tree Augmented Naïve Bayes (TAN); C4.5, C4.5 with Laplace estimation (C4.5-L), and C4.5 with bagging (C4.5-B): Classification Accuracy (ACC) & standard deviation. Data Set CLLTree NBTree NB TAN C4.5 C4.5-L C4.5-B anneal 99.06± ± ± ± ± ± ±0.89 anneal.orig 89.94± ± ± ± ± ± ±2.44 audiology 78.40± ± ± ± ± ± ±7.31 balance-scale 91.44± ± ± ± ± ± ±5.38 breast-cancer 72.14± ± ± ± ± ± ±5.75 breast-w 95.08± ± ± ± ± ± ±2.71 colic 78.08± ± ± ± ± ± ±6.21 colic.orig 75.57± ± ± ± ± ± ±5.44 credit-a 85.13± ± ± ± ± ± ±4.20 credit-g 76.01± ± ± ± ± ± ±3.78 diabetes 75.63± ± ± ± ± ± ±4.63 glass 58.69± ± ± ± ± ± ±8.99 heart-c 80.54± ± ± ± ± ± ±6.20 heart-h 81.41± ± ± ± ± ± ±7.08 heart-statlog 83.59± ± ± ± ± ± ±6.52 hepatitis 81.20± ± ± ± ± ± ±7.74 hypothyroid 92.90± ± ± ± ± ± ±0.45 iris 93.73± ± ± ± ± ± ±5.02 kr-vs-kp 98.93± ± ± ± ± ± ±0.38 labor 93.93± ± ± ± ± ± ±13.11 letter 86.24± ± ± ± ± ± ±0.85 lymph 82.79± ± ± ± ± ± ±9.06 mushroom ± ± ± ± ± ± ±0.00 primary-tumor 46.17± ± ± ± ± ± ±6.08 segment 93.13± ± ± ± ± ± ±1.46 sick 97.80± ± ± ± ± ± ±0.71 soybean 93.07± ± ± ± ± ± ±2.70 splice 95.39± ± ± ± ± ± ±1.28 vehicle 68.83± ± ± ± ± ± ±4.07 vote 94.65± ± ± ± ± ± ±2.65 vowel 91.59± ± ± ± ± ± ±3.73 waveform ± ± ± ± ± ± ±1.83 zoo 93.86± ± ± ± ± ± ±7.16 average 85.13± ± ± ± ± ± ±4.41, statistically significant degradation or improvement compared with CLLTree Table 7. Summary on t-test of experimental results: ACC comparisons on CLL- Tree, NBTree,NB, TAN, C4.5, C4.5-L and C4.5-B. C4.5 C4.5-L C4.5-B TAN NB NBTree C4.5-L 3/30/0 C4.5-B 6/27/0 7/26/0 TAN 8/22/3 10/19/4 3/25/5 NB 8/13/12 8/14/11 5/15/13 3/19/11 NBTree 7/24/2 8/24/1 5/25/3 3/26/4 11/22/0 CLLTree 8/23/2 7/23/3 4/26/3 6/24/3 11/21/1 3/29/1

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called Improving Simple Bayes Ron Kohavi Barry Becker Dan Sommereld Data Mining and Visualization Group Silicon Graphics, Inc. 2011 N. Shoreline Blvd. Mountain View, CA 94043 fbecker,ronnyk,sommdag@engr.sgi.com

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Generation of Attribute Value Taxonomies from Data for Data-Driven Construction of Accurate and Compact Classifiers

Generation of Attribute Value Taxonomies from Data for Data-Driven Construction of Accurate and Compact Classifiers Generation of Attribute Value Taxonomies from Data for Data-Driven Construction of Accurate and Compact Classifiers Dae-Ki Kang, Adrian Silvescu, Jun Zhang, and Vasant Honavar Artificial Intelligence Research

More information

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

More information

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de

More information

Semi-Supervised Face Detection

Semi-Supervised Face Detection Semi-Supervised Face Detection Nicu Sebe, Ira Cohen 2, Thomas S. Huang 3, Theo Gevers Faculty of Science, University of Amsterdam, The Netherlands 2 HP Research Labs, USA 3 Beckman Institute, University

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview

More information

Chapter 2 Rule Learning in a Nutshell

Chapter 2 Rule Learning in a Nutshell Chapter 2 Rule Learning in a Nutshell This chapter gives a brief overview of inductive rule learning and may therefore serve as a guide through the rest of the book. Later chapters will expand upon the

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning Hendrik Blockeel and Joaquin Vanschoren Computer Science Dept., K.U.Leuven, Celestijnenlaan 200A, 3001 Leuven, Belgium

More information

Discriminative Learning of Beam-Search Heuristics for Planning

Discriminative Learning of Beam-Search Heuristics for Planning Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University

More information

Softprop: Softmax Neural Network Backpropagation Learning

Softprop: Softmax Neural Network Backpropagation Learning Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

A Version Space Approach to Learning Context-free Grammars

A Version Space Approach to Learning Context-free Grammars Machine Learning 2: 39~74, 1987 1987 Kluwer Academic Publishers, Boston - Manufactured in The Netherlands A Version Space Approach to Learning Context-free Grammars KURT VANLEHN (VANLEHN@A.PSY.CMU.EDU)

More information

Applications of data mining algorithms to analysis of medical data

Applications of data mining algorithms to analysis of medical data Master Thesis Software Engineering Thesis no: MSE-2007:20 August 2007 Applications of data mining algorithms to analysis of medical data Dariusz Matyja School of Engineering Blekinge Institute of Technology

More information

Content-based Image Retrieval Using Image Regions as Query Examples

Content-based Image Retrieval Using Image Regions as Query Examples Content-based Image Retrieval Using Image Regions as Query Examples D. N. F. Awang Iskandar James A. Thom S. M. M. Tahaghoghi School of Computer Science and Information Technology, RMIT University Melbourne,

More information

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,

More information

CSL465/603 - Machine Learning

CSL465/603 - Machine Learning CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am

More information

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1 Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining (Portland, OR, August 1996). Predictive Data Mining with Finite Mixtures Petri Kontkanen Petri Myllymaki

More information

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Transfer Learning Action Models by Measuring the Similarity of Different Domains Transfer Learning Action Models by Measuring the Similarity of Different Domains Hankui Zhuo 1, Qiang Yang 2, and Lei Li 1 1 Software Research Institute, Sun Yat-sen University, Guangzhou, China. zhuohank@gmail.com,lnslilei@mail.sysu.edu.cn

More information

NCEO Technical Report 27

NCEO Technical Report 27 Home About Publications Special Topics Presentations State Policies Accommodations Bibliography Teleconferences Tools Related Sites Interpreting Trends in the Performance of Special Education Students

More information

Probability and Statistics Curriculum Pacing Guide

Probability and Statistics Curriculum Pacing Guide Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application International Journal of Medical Science and Clinical Inventions 4(3): 2768-2773, 2017 DOI:10.18535/ijmsci/ v4i3.8 ICV 2015: 52.82 e-issn: 2348-991X, p-issn: 2454-9576 2017, IJMSCI Research Article Comparison

More information

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation School of Computer Science Human-Computer Interaction Institute Carnegie Mellon University Year 2007 Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation Noboru Matsuda

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

Lecture 10: Reinforcement Learning

Lecture 10: Reinforcement Learning Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation

More information

Truth Inference in Crowdsourcing: Is the Problem Solved?

Truth Inference in Crowdsourcing: Is the Problem Solved? Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer

More information

Constructive Induction-based Learning Agents: An Architecture and Preliminary Experiments

Constructive Induction-based Learning Agents: An Architecture and Preliminary Experiments Proceedings of the First International Workshop on Intelligent Adaptive Systems (IAS-95) Ibrahim F. Imam and Janusz Wnek (Eds.), pp. 38-51, Melbourne Beach, Florida, 1995. Constructive Induction-based

More information

Mining Student Evolution Using Associative Classification and Clustering

Mining Student Evolution Using Associative Classification and Clustering Mining Student Evolution Using Associative Classification and Clustering 19 Mining Student Evolution Using Associative Classification and Clustering Kifaya S. Qaddoum, Faculty of Information, Technology

More information

stateorvalue to each variable in a given set. We use p(x = xjy = y) (or p(xjy) as a shorthand) to denote the probability that X = x given Y = y. We al

stateorvalue to each variable in a given set. We use p(x = xjy = y) (or p(xjy) as a shorthand) to denote the probability that X = x given Y = y. We al Dependency Networks for Collaborative Filtering and Data Visualization David Heckerman, David Maxwell Chickering, Christopher Meek, Robert Rounthwaite, Carl Kadie Microsoft Research Redmond WA 98052-6399

More information

Word learning as Bayesian inference

Word learning as Bayesian inference Word learning as Bayesian inference Joshua B. Tenenbaum Department of Psychology Stanford University jbt@psych.stanford.edu Fei Xu Department of Psychology Northeastern University fxu@neu.edu Abstract

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance Cristina Conati, Kurt VanLehn Intelligent Systems Program University of Pittsburgh Pittsburgh, PA,

More information

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Exploration. CS : Deep Reinforcement Learning Sergey Levine Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?

More information

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Using Web Searches on Important Words to Create Background Sets for LSI Classification Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Action Models and their Induction

Action Models and their Induction Action Models and their Induction Michal Čertický, Comenius University, Bratislava certicky@fmph.uniba.sk March 5, 2013 Abstract By action model, we understand any logic-based representation of effects

More information

University of Groningen. Systemen, planning, netwerken Bosman, Aart

University of Groningen. Systemen, planning, netwerken Bosman, Aart University of Groningen Systemen, planning, netwerken Bosman, Aart IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.

More information

Ordered Incremental Training with Genetic Algorithms

Ordered Incremental Training with Genetic Algorithms Ordered Incremental Training with Genetic Algorithms Fangming Zhu, Sheng-Uei Guan* Department of Electrical and Computer Engineering, National University of Singapore, 10 Kent Ridge Crescent, Singapore

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

CS 446: Machine Learning

CS 446: Machine Learning CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt

More information

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method Farhadi F, Sorkhi M, Hashemi S et al. An effective framework for fast expert mining in collaboration networks: A grouporiented and cost-based method. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(3): 577

More information

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18 Version Space Javier Béjar cbea LSI - FIB Term 2012/2013 Javier Béjar cbea (LSI - FIB) Version Space Term 2012/2013 1 / 18 Outline 1 Learning logical formulas 2 Version space Introduction Search strategy

More information

Why Did My Detector Do That?!

Why Did My Detector Do That?! Why Did My Detector Do That?! Predicting Keystroke-Dynamics Error Rates Kevin Killourhy and Roy Maxion Dependable Systems Laboratory Computer Science Department Carnegie Mellon University 5000 Forbes Ave,

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Beyond the Pipeline: Discrete Optimization in NLP

Beyond the Pipeline: Discrete Optimization in NLP Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

MYCIN. The MYCIN Task

MYCIN. The MYCIN Task MYCIN Developed at Stanford University in 1972 Regarded as the first true expert system Assists physicians in the treatment of blood infections Many revisions and extensions over the years The MYCIN Task

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February

More information

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C Using and applying mathematics objectives (Problem solving, Communicating and Reasoning) Select the maths to use in some classroom

More information

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations Katarzyna Stapor (B) Institute of Computer Science, Silesian Technical University, Gliwice, Poland katarzyna.stapor@polsl.pl

More information

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University 06.11.16 13.11.16 Hannover Our group from Peter the Great St. Petersburg

More information

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany Jana Kitzmann and Dirk Schiereck, Endowed Chair for Banking and Finance, EUROPEAN BUSINESS SCHOOL, International

More information

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One

More information

How do adults reason about their opponent? Typologies of players in a turn-taking game

How do adults reason about their opponent? Typologies of players in a turn-taking game How do adults reason about their opponent? Typologies of players in a turn-taking game Tamoghna Halder (thaldera@gmail.com) Indian Statistical Institute, Kolkata, India Khyati Sharma (khyati.sharma27@gmail.com)

More information

The Strong Minimalist Thesis and Bounded Optimality

The Strong Minimalist Thesis and Bounded Optimality The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this

More information

Data Stream Processing and Analytics

Data Stream Processing and Analytics Data Stream Processing and Analytics Vincent Lemaire Thank to Alexis Bondu, EDF Outline Introduction on data-streams Supervised Learning Conclusion 2 3 Big Data what does that mean? Big Data Analytics?

More information

Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Basic Concepts of Machine Learning Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010

More information

Learning and Transferring Relational Instance-Based Policies

Learning and Transferring Relational Instance-Based Policies Learning and Transferring Relational Instance-Based Policies Rocío García-Durán, Fernando Fernández y Daniel Borrajo Universidad Carlos III de Madrid Avda de la Universidad 30, 28911-Leganés (Madrid),

More information

Seminar - Organic Computing

Seminar - Organic Computing Seminar - Organic Computing Self-Organisation of OC-Systems Markus Franke 25.01.2006 Typeset by FoilTEX Timetable 1. Overview 2. Characteristics of SO-Systems 3. Concern with Nature 4. Design-Concepts

More information

Disambiguation of Thai Personal Name from Online News Articles

Disambiguation of Thai Personal Name from Online News Articles Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online

More information

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees Mariusz Łapczy ski 1 and Bartłomiej Jefma ski 2 1 The Chair of Market Analysis and Marketing Research,

More information

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education GCSE Mathematics B (Linear) Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education Mark Scheme for November 2014 Oxford Cambridge and RSA Examinations OCR (Oxford Cambridge

More information

WHEN THERE IS A mismatch between the acoustic

WHEN THERE IS A mismatch between the acoustic 808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,

More information

Proof Theory for Syntacticians

Proof Theory for Syntacticians Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

A Comparison of Standard and Interval Association Rules

A Comparison of Standard and Interval Association Rules A Comparison of Standard and Association Rules Choh Man Teng cmteng@ai.uwf.edu Institute for Human and Machine Cognition University of West Florida 4 South Alcaniz Street, Pensacola FL 325, USA Abstract

More information

A NEW ALGORITHM FOR GENERATION OF DECISION TREES

A NEW ALGORITHM FOR GENERATION OF DECISION TREES TASK QUARTERLY 8 No 2(2004), 1001 1005 A NEW ALGORITHM FOR GENERATION OF DECISION TREES JERZYW.GRZYMAŁA-BUSSE 1,2,ZDZISŁAWS.HIPPE 2, MAKSYMILIANKNAP 2 ANDTERESAMROCZEK 2 1 DepartmentofElectricalEngineeringandComputerScience,

More information

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 98 (2016 ) 368 373 The 6th International Conference on Current and Future Trends of Information and Communication Technologies

More information

Corrective Feedback and Persistent Learning for Information Extraction

Corrective Feedback and Persistent Learning for Information Extraction Corrective Feedback and Persistent Learning for Information Extraction Aron Culotta a, Trausti Kristjansson b, Andrew McCallum a, Paul Viola c a Dept. of Computer Science, University of Massachusetts,

More information

Mining Association Rules in Student s Assessment Data

Mining Association Rules in Student s Assessment Data www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion

More information