Decision Tree. Machine Learning. Hamid Beigy. Sharif University of Technology. Fall 1396

Size: px
Start display at page:

Download "Decision Tree. Machine Learning. Hamid Beigy. Sharif University of Technology. Fall 1396"

Transcription

1 Decision Tree Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

2 Table of contents 1 Introduction 2 Decision tree classification 3 Building decision trees 4 ID3 Algorithm Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

3 Introduction 1 The decision tree is a classic and natural model of learning. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

4 Introduction 1 The decision tree is a classic and natural model of learning. 2 It is closely related to the notion of divide and conquer. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

5 Introduction 1 The decision tree is a classic and natural model of learning. 2 It is closely related to the notion of divide and conquer. 3 A decision tree partitions the instance space into axis-parallel regions, labeled with class value Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

6 Introduction 1 The decision tree is a classic and natural model of learning. 2 It is closely related to the notion of divide and conquer. 3 A decision tree partitions the instance space into axis-parallel regions, labeled with class value 4 Why decsion trees? Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

7 Introduction 1 The decision tree is a classic and natural model of learning. 2 It is closely related to the notion of divide and conquer. 3 A decision tree partitions the instance space into axis-parallel regions, labeled with class value 4 Why decsion trees? Interpretable, popular in medical applications because they mimic the way a doctor thinks Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

8 Introduction 1 The decision tree is a classic and natural model of learning. 2 It is closely related to the notion of divide and conquer. 3 A decision tree partitions the instance space into axis-parallel regions, labeled with class value 4 Why decsion trees? Interpretable, popular in medical applications because they mimic the way a doctor thinks Can model discrete outcomes nicely Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

9 Introduction 1 The decision tree is a classic and natural model of learning. 2 It is closely related to the notion of divide and conquer. 3 A decision tree partitions the instance space into axis-parallel regions, labeled with class value 4 Why decsion trees? Interpretable, popular in medical applications because they mimic the way a doctor thinks Can model discrete outcomes nicely Can be very powerful, can be as complex as you need them Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

10 Introduction 1 The decision tree is a classic and natural model of learning. 2 It is closely related to the notion of divide and conquer. 3 A decision tree partitions the instance space into axis-parallel regions, labeled with class value 4 Why decsion trees? Interpretable, popular in medical applications because they mimic the way a doctor thinks Can model discrete outcomes nicely Can be very powerful, can be as complex as you need them C4.5 and CART decision trees are very popular. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

11 Decision tree classification 1 Structure of decsion trees Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

12 Decision tree classification 1 Structure of decsion trees Each internal node tests an attribute Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

13 Decision tree classification 1 Structure of decsion trees Each internal node tests an attribute Each branch corresponds to attribute value Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

14 Decision tree classification 1 Structure of decsion trees Each internal node tests an attribute Each branch corresponds to attribute value Each leaf node assigns a classification. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

15 can Decision not be tree trained classification incrementally., ID5, ID5R are samples of incremental induction of decision trees 1 ding Structure of decsion trees Each internal node tests an attribute P.E. Utgoff, Each Incremental branch corresponds Induction to attribute of Decision value Trees, Machine Learning, Vol. 4, p 186,1989. Each leaf node assigns a classification. 2 Decision Tree for PlayTennis [9+,5-] Outlook? Sunny Overcast Rain [2+,3-] Humidity? Yes Wind? [4+,0-] High Normal Strong Light [3+,2-] No Yes [0+,3-] [2+,0-] No Yes [0+,2-] [3+,0-] Machine Learning Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

16 e idea of binary classification trees is not unlike that of the histogram - partition the featur Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24 Decision surface Histogram Classifier Linear Classifier 0 1 Tree Classifier (a) (b) (c) Figure 9.2: (a) Histogram classifier ; (b) Linear classifier; (c)tree classifier. 3 Binary Classification Trees

17 poor generalization characteristics, and then prune this tree, to avoid overfitting. Building decision trees Growing Trees The growing process is based on recursively subdividing the feature space. Usually the subdivisions are splits of existing regions into two smaller regions (i.e., binary splits). For simplicity, the splits are perpendicular to one of the feature axis. An example of such construction is depicted in Figure Decsion trees recursively subdivide the feature space. and so on... Figure 9.3: Growing a recursive binary tree (X =[0, 1] 2 ). Often the splitting process is based on the training data, and is designed to separate data with di erent labels as much as possible. In such constructions, the splits, and hence the treestructure itself, are data dependent. This causes major di culties for the analysis (and tunning) of these methods. Alternatively, the splitting and subdivision could be taken independent from the training data. The latter approach is the one we are going to investigate in detail, since it is more amenable to analysis, and we will consider Dyadic Decision Trees and Recursive Dyadic Partitions (depicted in Figure 9.4) in particular. 84 Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

18 poor generalization characteristics, and then prune this tree, to avoid overfitting. Building decision trees Growing Trees The growing process is based on recursively subdividing the feature space. Usually the subdivisions are splits of existing regions into two smaller regions (i.e., binary splits). For simplicity, the Until now we have been referring to trees, but did not made clear how do trees relate to splits are perpendicular to one of the feature axis. An example of such construction is depicted partitions. It turns out that any decision tree can be associated with a partition of the input in Figure 9.3. space X and vice-versa. In particular, a Recursive Dyadic Partition (RDP) can be associated with a (binary) tree. In fact, this is the most e cient way of describing a RDP. In Figure 9.4 we illustrate the procedure. Each leaf of the tree corresponds to a cell of the partition. The nodes in the tree correspond to the various partition cells that are generated in the construction and so on... of the tree. The orientation of the dyadic split alternates between the levels of the tree (for the example of Figure 9.4, at the root level the split is done in the horizontal axis, at the level below that (the level of nodes 2 and 3) the split is done in the vertical axis, and so on...). The tree is called dyadic because the splits of cells are always at the midpoint along one coordinate axis, and consequently the Figure sidelengths 9.3: Growing of all cells a recursive are dyadic binary (i.e., tree powers (X of =[0, 2). 1] 2 ). 1 Decsion trees recursively subdivide the feature space. 2 The test variable specifies the division Often the splitting process is based on the training data, and is designed to separate 1 data 1 with di erent labels as much as possible. In such constructions, 1 the splits, 4 and hence the treestructure itself, are data dependent. This causes major di culties for the analysis (and tunning) of these methods. Alternatively, the splitting and subdivision could be taken independent from 4 5 the training data. The latter approach is the one we are going to investigate in detail, since it is more amenable to analysis, and we will consider Dyadic Decision Trees and Recursive Dyadic Partitions (depicted in Figure 9.4) in particular Figure 9.4: Example of Recursive Dyadic Partition (RDP) growing (X =[0, 1] 2 ). In the following we are going to consider the 2-dimensional case, but all the results can be Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

19 Building decision trees (example) 10! Training Examples for Concept PlayTennis Training examples for PlayTennis An Illustrative Example Day Outlook Temperature Humidity Wind PlayTennis? 1 Sunny Hot High Light No 2 Sunny Hot High Strong No 3 Overcast Hot High Light Yes 4 Rain Mild High Light Yes 5 Rain Cool Normal Light Yes 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 8 Sunny Mild High Light No 9 Sunny Cool Normal Light Yes 10 Rain Mild Normal Light Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 13 Overcast Hot Normal Light Yes 14 Rain Mild High Strong No! ID3 Build-DT using Gain( )! How Will ID3 Construct A Decision Tree? Machine Learning Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

20 Building decision trees (cont.) How to build a decision tree? Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

21 Building decision trees (cont.) How to build a decision tree? 1 Start at the top of the tree. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

22 Building decision trees (cont.) How to build a decision tree? 1 Start at the top of the tree. 2 Grow it by splitting attributes one by one. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

23 Building decision trees (cont.) How to build a decision tree? 1 Start at the top of the tree. 2 Grow it by splitting attributes one by one. 3 Assign leaf nodes. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

24 Building decision trees (cont.) How to build a decision tree? 1 Start at the top of the tree. 2 Grow it by splitting attributes one by one. 3 Assign leaf nodes. 4 When we get to the bottom, prune the tree to prevent overfitting. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

25 Building decision trees (cont.) How to build a decision tree? 1 Start at the top of the tree. 2 Grow it by splitting attributes one by one. 3 Assign leaf nodes. 4 When we get to the bottom, prune the tree to prevent overfitting. How choose a test variable for an internal node? Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

26 Building decision trees (cont.) How to build a decision tree? 1 Start at the top of the tree. 2 Grow it by splitting attributes one by one. 3 Assign leaf nodes. 4 When we get to the bottom, prune the tree to prevent overfitting. How choose a test variable for an internal node? Choosing different measures result in different algorithms. We describe ID Gini index Entropy Misclassification error p Node impurity measures for two-class classification, as a function of the Hamid Beigy (Sharif University proportion of Technology) p in class 2. Cross-entropy Decisionhas Tree been scaled to pass through Fall / 24

27 Building decision trees (cont.) ID3 uses information gain to choose a test variable for an internal node. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

28 Building decision trees (cont.) ID3 uses information gain to choose a test variable for an internal node. The information gain of D relative to attribute A is the expected reduction in entropy due to splitting on A. Gain(D, A) = H(D) [ ] Dv D H(D v ) v values(a) where D v is {x D : x.a = v}, the set of examples in D where attribute A has value v Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

29 ID3 Algorihm 10! Training Examples for Concept PlayTennis An Illustrative Example Day Outlook Temperature Humidity Wind PlayTennis? 1 Sunny Hot High Light No 2 Sunny Hot High Strong No 3 Overcast Hot High Light Yes 4 Rain Mild High Light Yes 5 Rain Cool Normal Light Yes 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 8 Sunny Mild High Light No 9 Sunny Cool Normal Light Yes 10 Rain Mild Normal Light Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 13 Overcast Hot Normal Light Yes 14 Rain Mild High Strong No! ID3 Build-DT using Gain( )! How Will ID3 Construct A Decision Tree? H(D) = (9/14) log(9/14) (5/14) log(5/14) = 0.94bits H(D, Humidity = High) = (3/7) log(3/7) (4/7) log(4/7) = 0.985bits H(D, Humidity = Normal) = (6/7) log(6/7) (1/7) log(1/7) = 0.592bits Machine Learning Gain(D, Humidity) = 0.94 (7/14) (7/14) = 0.151bits Gain(D, Wind) = 0.94 (8/14) (6/14) 1.0 = 0.048bits Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

30 ID3 Algorihm An Illustrative Example onstructing A Decision Tree 10! Training Examples for Concept PlayTennis for PlayTennis using ID3 [ 2 ] Day Outlook Temperature Humidity Wind PlayTennis? 1 Sunny Hot High Light No 2 Sunny Hot High Strong No 3 Overcast Hot High Light Yes Temperature Humidity Wind PlayTennis? 4 Rain Mild High Light Yes Hot High Light No 5 Rain Cool Normal Light Yes Hot High Strong No t Hot 6 High Rain Cool Yes Normal Strong No Mild 7 HighOvercast LightCool Yes Normal Strong Yes Cool 8 Normal Sunny Light Mild Yes High Light No Cool 9 Normal Sunny Strong Cool No Normal Light Yes t Cool 10 Normal Rain Strong Mild Yes Normal Light Yes Mild 11 HighSunny LightMild No Normal Strong Yes Cool 12 Normal Overcast LightMild Yes High Strong Yes Mild 13 Normal Overcast LightHot Yes Normal Light Yes Mild 14 Normal Rain Strong Mild Yes High Strong No t Mild High Strong Yes t Hot Normal Light Yes Mild! ID3 Build-DT High using Gain( ) Strong No! How Will ID3 Construct A Decision Tree? [9+, 5-] Gain(D, Humidity) = 0.151bits Outlook.151 bits Attribute bits Gain(D, Wind) = 0.048bits ) = Gain(D, bits Temperature) = 0.029bits 246 bitsgain(d, Outlook) = 0.246bits ttribute (Root of Subtree) Machine Learning Sunny [2+, 3-] Overcast [4+, 0-] Rain [3+, 2-] Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

31 ID3 Algorihm An Illustrative Example 10! Training Examples for Concept PlayTennis Day Outlook Temperature Humidity Wind PlayTennis? 1 Sunny Hot High Light No 2 Sunny Hot High Strong No 3 Overcast Hot High Light Yes 4 Rain Mild High Light Yes 5 Rain Cool Normal Light Yes 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 8 Sunny Mild High Light No 9 Sunny Cool Normal Light Yes 10 Rain Mild Normal Incremental Light Learning of Decision Yes Trees 11 Sunny Mild Normal Strong Yes Overcast Mild High Strong Yes! ID3 can not be trained incrementally. 13 Overcast Hot Normal Light Yes 14 Rain Mild High Strong No! ID3 Build-DT using Gain( )! Gain(D How Will Sunny ID3, Construct Humidity) A Decision = Tree? 0.97bits Gain(D Sunny, Wind) = 0.02bits Gain(D Sunny, Temperature) = 0.57bits Machine Learning! ID4, ID5, ID5R are samples of incremental induction of decision trees! Reading " P.E. Utgoff, Incremental Induction of Decision Trees, Machine Learning, Vol. 4, pp ,1989. [2+,3-] No [9+,5-] High Humidity? Yes Outlook? Sunny Overcast Rain Normal [0+,3-] [2+,0-] Yes [4+,0-] Strong No Wind? Light Yes [0+,2-] [3+,0-] [3+,2-] Machine Learning Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

32 Inductive Bias in ID3 Types of Biases Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

33 Inductive Bias in ID3 Types of Biases 1 Preference (search) bias Put priority on choosing hypothesis. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

34 Inductive Bias in ID3 Types of Biases 1 Preference (search) bias Put priority on choosing hypothesis. 2 Language bias Put restriction on the set of hypotheses considered Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

35 Inductive Bias in ID3 Types of Biases 1 Preference (search) bias Put priority on choosing hypothesis. 2 Language bias Put restriction on the set of hypotheses considered Which Bias is better? Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

36 Inductive Bias in ID3 Types of Biases 1 Preference (search) bias Put priority on choosing hypothesis. 2 Language bias Put restriction on the set of hypotheses considered Which Bias is better? 1 Preference bias is more desirable. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

37 Inductive Bias in ID3 Types of Biases 1 Preference (search) bias Put priority on choosing hypothesis. 2 Language bias Put restriction on the set of hypotheses considered Which Bias is better? 1 Preference bias is more desirable. 2 Because, the learner works within a complete space that is assured to contain the unknown concept. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

38 Inductive Bias in ID3 Types of Biases 1 Preference (search) bias Put priority on choosing hypothesis. 2 Language bias Put restriction on the set of hypotheses considered Which Bias is better? 1 Preference bias is more desirable. 2 Because, the learner works within a complete space that is assured to contain the unknown concept. Inductive Bias of ID3 Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

39 Inductive Bias in ID3 Types of Biases 1 Preference (search) bias Put priority on choosing hypothesis. 2 Language bias Put restriction on the set of hypotheses considered Which Bias is better? 1 Preference bias is more desirable. 2 Because, the learner works within a complete space that is assured to contain the unknown concept. Inductive Bias of ID3 1 Shorter trees are preferred over longer trees. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

40 Inductive Bias in ID3 Types of Biases 1 Preference (search) bias Put priority on choosing hypothesis. 2 Language bias Put restriction on the set of hypotheses considered Which Bias is better? 1 Preference bias is more desirable. 2 Because, the learner works within a complete space that is assured to contain the unknown concept. Inductive Bias of ID3 1 Shorter trees are preferred over longer trees. 2 Occams razor : Prefer the simplest hypothesis that fits the data. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

41 Inductive Bias in ID3 Types of Biases 1 Preference (search) bias Put priority on choosing hypothesis. 2 Language bias Put restriction on the set of hypotheses considered Which Bias is better? 1 Preference bias is more desirable. 2 Because, the learner works within a complete space that is assured to contain the unknown concept. Inductive Bias of ID3 1 Shorter trees are preferred over longer trees. 2 Occams razor : Prefer the simplest hypothesis that fits the data. 3 Trees that place high information gain attributes close to the root are preferred over those that do not. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

42 Overfitting in ID3 How can we avoid over-fitting? Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

43 Overfitting in ID3 How can we avoid over-fitting? 1 Prevention Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

44 Overfitting in ID3 How can we avoid over-fitting? 1 Prevention Stop training (growing) before it reaches the point that overfits. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

45 Overfitting in ID3 How can we avoid over-fitting? 1 Prevention Stop training (growing) before it reaches the point that overfits. Select attributes that are relevant (i.e., will be useful in the decision tree) Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

46 Overfitting in ID3 How can we avoid over-fitting? 1 Prevention Stop training (growing) before it reaches the point that overfits. Select attributes that are relevant (i.e., will be useful in the decision tree) Requires some predictive measure of relevance Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

47 Overfitting in ID3 How can we avoid over-fitting? 1 Prevention Stop training (growing) before it reaches the point that overfits. Select attributes that are relevant (i.e., will be useful in the decision tree) Requires some predictive measure of relevance 2 Avoidance Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

48 Overfitting in ID3 How can we avoid over-fitting? 1 Prevention Stop training (growing) before it reaches the point that overfits. Select attributes that are relevant (i.e., will be useful in the decision tree) Requires some predictive measure of relevance 2 Avoidance Allow to over-fit, then improve the generalization capability of the tree. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

49 Overfitting in ID3 How can we avoid over-fitting? 1 Prevention Stop training (growing) before it reaches the point that overfits. Select attributes that are relevant (i.e., will be useful in the decision tree) Requires some predictive measure of relevance 2 Avoidance Allow to over-fit, then improve the generalization capability of the tree. Holding out a validation set (test set) Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

50 Overfitting in ID3 How can we avoid over-fitting? 1 Prevention Stop training (growing) before it reaches the point that overfits. Select attributes that are relevant (i.e., will be useful in the decision tree) Requires some predictive measure of relevance 2 Avoidance Allow to over-fit, then improve the generalization capability of the tree. Holding out a validation set (test set) 3 Detection and Recovery Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

51 Overfitting in ID3 How can we avoid over-fitting? 1 Prevention Stop training (growing) before it reaches the point that overfits. Select attributes that are relevant (i.e., will be useful in the decision tree) Requires some predictive measure of relevance 2 Avoidance Allow to over-fit, then improve the generalization capability of the tree. Holding out a validation set (test set) 3 Detection and Recovery Letting the problem happen, detecting when it does, recovering afterward Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

52 Overfitting in ID3 How can we avoid over-fitting? 1 Prevention Stop training (growing) before it reaches the point that overfits. Select attributes that are relevant (i.e., will be useful in the decision tree) Requires some predictive measure of relevance 2 Avoidance Allow to over-fit, then improve the generalization capability of the tree. Holding out a validation set (test set) 3 Detection and Recovery Letting the problem happen, detecting when it does, recovering afterward Build model, remove (prune) elements that contribute to overfitting Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

53 Overfitting in ID3 How can we avoid over-fitting? 1 Prevention Stop training (growing) before it reaches the point that overfits. Select attributes that are relevant (i.e., will be useful in the decision tree) Requires some predictive measure of relevance 2 Avoidance Allow to over-fit, then improve the generalization capability of the tree. Holding out a validation set (test set) 3 Detection and Recovery Letting the problem happen, detecting when it does, recovering afterward Build model, remove (prune) elements that contribute to overfitting How to select Best tree? Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

54 Overfitting in ID3 How can we avoid over-fitting? 1 Prevention Stop training (growing) before it reaches the point that overfits. Select attributes that are relevant (i.e., will be useful in the decision tree) Requires some predictive measure of relevance 2 Avoidance Allow to over-fit, then improve the generalization capability of the tree. Holding out a validation set (test set) 3 Detection and Recovery Letting the problem happen, detecting when it does, recovering afterward Build model, remove (prune) elements that contribute to overfitting How to select Best tree? 1 Training and validation set Use a separate set of examples (distinct from the training set) for test. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

55 Overfitting in ID3 How can we avoid over-fitting? 1 Prevention Stop training (growing) before it reaches the point that overfits. Select attributes that are relevant (i.e., will be useful in the decision tree) Requires some predictive measure of relevance 2 Avoidance Allow to over-fit, then improve the generalization capability of the tree. Holding out a validation set (test set) 3 Detection and Recovery Letting the problem happen, detecting when it does, recovering afterward Build model, remove (prune) elements that contribute to overfitting How to select Best tree? 1 Training and validation set Use a separate set of examples (distinct from the training set) for test. 2 Statistical test Use all data for training, but apply the statistical test to estimate the over-fitting. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

56 Overfitting in ID3 How can we avoid over-fitting? 1 Prevention Stop training (growing) before it reaches the point that overfits. Select attributes that are relevant (i.e., will be useful in the decision tree) Requires some predictive measure of relevance 2 Avoidance Allow to over-fit, then improve the generalization capability of the tree. Holding out a validation set (test set) 3 Detection and Recovery Letting the problem happen, detecting when it does, recovering afterward Build model, remove (prune) elements that contribute to overfitting How to select Best tree? 1 Training and validation set Use a separate set of examples (distinct from the training set) for test. 2 Statistical test Use all data for training, but apply the statistical test to estimate the over-fitting. 3 Define the measure of complexity Halting the grow when this measure is minimized. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

57 Machine Lea Pruning algorithms Reduced-Error Pruning. Acc Size error true error training error Number of nodes in tree Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

58 Pruning algorithms Reduced-Error Pruning. oss-validation Approach aining and Validation Sets, node) btree rooted at node Reduced-Error Pruning error af (with majority label of associated examples) ed-error-pruning (D) D train (training / growing ), D validation (validation / pruning ) tree T using ID3 on D train y on D validation decreases DO n-leaf node candidate in T p[candidate] Prune (T, candidate) uracy[candidate] Test (Temp[candidate], D validation ) p with best value of Accuracy (best increase; greedy) ed) T The effect of reduced error pruning on error Accuracy Acc true error training error Number of nodes in tree Size Machine Lea On training data On test data 0.6 Post-pruned tree 0.55 Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

59 Pruning algorithms Reduced-Error Pruning. oss-validation Approach aining and Validation Sets, node) btree rooted at node Reduced-Error Pruning error af (with majority label of associated examples) ed-error-pruning (D) D train (training / growing ), D validation (validation / pruning ) tree T using ID3 on D train y on D validation decreases DO n-leaf node candidate in T p[candidate] Prune (T, candidate) uracy[candidate] Test (Temp[candidate], D validation ) p with best value of Accuracy (best increase; greedy) ed) T The effect of reduced error pruning on error Accuracy Acc true error training error Number of nodes in tree Size Machine Lea On training data On test data 0.6 Post-pruned tree 0.55 Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

60 Pruning algorithms Prunning algorithms Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

61 Pruning algorithms Prunning algorithms 1 Reduced Error Pruning Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

62 Pruning algorithms Prunning algorithms 1 Reduced Error Pruning 2 Pessimistic Error Pruning Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

63 Pruning algorithms Prunning algorithms 1 Reduced Error Pruning 2 Pessimistic Error Pruning 3 Minimum Error Pruning Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

64 Pruning algorithms Prunning algorithms 1 Reduced Error Pruning 2 Pessimistic Error Pruning 3 Minimum Error Pruning 4 Critical Value Pruning Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

65 Pruning algorithms Prunning algorithms 1 Reduced Error Pruning 2 Pessimistic Error Pruning 3 Minimum Error Pruning 4 Critical Value Pruning 5 Cost-Complexity Pruning Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

66 Pruning algorithms Prunning algorithms 1 Reduced Error Pruning 2 Pessimistic Error Pruning 3 Minimum Error Pruning 4 Critical Value Pruning 5 Cost-Complexity Pruning Reading Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

67 Pruning algorithms Prunning algorithms 1 Reduced Error Pruning 2 Pessimistic Error Pruning 3 Minimum Error Pruning 4 Critical Value Pruning 5 Cost-Complexity Pruning Reading 1 F. Esposito, D. Malerba, and G. Semeraro, A Comparative Analysis of Methods for Pruning Decision Trees, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 19, No. 5, pp , May Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

68 Pruning algorithms Prunning algorithms 1 Reduced Error Pruning 2 Pessimistic Error Pruning 3 Minimum Error Pruning 4 Critical Value Pruning 5 Cost-Complexity Pruning Reading 1 F. Esposito, D. Malerba, and G. Semeraro, A Comparative Analysis of Methods for Pruning Decision Trees, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 19, No. 5, pp , May S. R. Safavian and D. Landgrebe, A Survey of Decision Tree Classifier Methodology, IEEE Trans on Systems, Man, and Cybernetics, Vol. 21, No. 3, pp , Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

69 Continuous Valued Attributes Two methods for handling continuous attributes Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

70 Continuous Valued Attributes Two methods for handling continuous attributes 1 Discretization (e.g., histogramming) Break real-valued attributes into ranges in advance Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

71 Continuous Valued Attributes Example Two methods for handling continuous attributes 1 Discretization (e.g., histogramming) Break real-valued attributes into ranges in advance high = {Temp > 35C} med = {10C < Temp 35C} low = {Temp 10C} Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

72 Continuous Valued Attributes Example Two methods for handling continuous attributes 1 Discretization (e.g., histogramming) Break real-valued attributes into ranges in advance high = {Temp > 35C} med = {10C < Temp 35C} low = {Temp 10C} 2 Using thresholds for splitting nodes Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

73 Continuous Valued Attributes Example Two methods for handling continuous attributes 1 Discretization (e.g., histogramming) Break real-valued attributes into ranges in advance high = {Temp > 35C} med = {10C < Temp 35C} low = {Temp 10C} 2 Using thresholds for splitting nodes Example A a produces subsets A a and A > a. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

74 Continuous Valued Attributes Example Two methods for handling continuous attributes 1 Discretization (e.g., histogramming) Break real-valued attributes into ranges in advance high = {Temp > 35C} med = {10C < Temp 35C} low = {Temp 10C} 2 Using thresholds for splitting nodes Example A a produces subsets A a and A > a. 3 Information gain is calculated the same way as for discrete splits Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

75 Continuous Valued Attributes Example Two methods for handling continuous attributes 1 Discretization (e.g., histogramming) Break real-valued attributes into ranges in advance Example high = {Temp > 35C} med = {10C < Temp 35C} low = {Temp 10C} 2 Using thresholds for splitting nodes A a produces subsets A a and A > a. 3 Information gain is calculated the same way as for discrete splits How to find the split with highest Gain Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

76 Continuous Valued Attributes Example Two methods for handling continuous attributes 1 Discretization (e.g., histogramming) Break real-valued attributes into ranges in advance Example high = {Temp > 35C} med = {10C < Temp 35C} low = {Temp 10C} 2 Using thresholds for splitting nodes A a produces subsets A a and A > a. 3 Information gain is calculated the same way as for discrete splits How to find the split with highest Gain Example length label Thresholds Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

77 Missing Data Problem: What If Some Examples Missing Values of A? Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

78 Missing Data Training: evaluate Gain (D, A) where for some x D, a value Testing: classify a new example x without knowing the value! Solutions: Incorporating a Guess into Calculation of G Problem: What If Some Examples Missing Values of A? Consider dataset. Day Outlook Temperature Humidity Wind PlayTennis? 1 Sunny Hot High Light No 2 Sunny Hot High Strong No 3 Overcast Hot High Light Yes 4 Rain Mild High Light Yes 5 Rain Cool Normal Light Yes 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 8 Sunny Mild??? Light No 9 Sunny Cool Normal Light Yes 10 Rain Mild Normal Light Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 13 Overcast Hot Normal Light Yes 14 Rain Mild High Strong No [2 Machine Learning Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

79 known Attribute Values Missing Data Training: evaluate Gain (D, A) where for some x D, a value Testing: classify a new example x without knowing the value s Missing! Solutions: Values of Incorporating A? a Guess into Calculation of G Problem: What If Some Examples Missing Values of A? butes Consider during dataset. training or testing Day Outlook Temperature Humidity Wind PlayTennis? 1 Sunny Hot High Light No 2 Sunny Hot High Strong No 3 Overcast Hot High Light Yes 4 Rain Mild High Light Yes 5 Rain Cool Normal Light Yes 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 8 Sunny Mild??? Light No 9 Sunny Cool Normal Light Yes 10 Rain Mild Normal Light Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 13 Overcast Hot Normal Light Yes 14 Rain Mild High Strong No ormal,, Blood-Test =?, >, sometimes low priority (or cost too high) ssification ere for some x D, a value for A is not given without knowing the value of A into Calculation of Gain(D, A) Wind Light Strong Light Light Light Strong Strong Light Light PlayTennis? No No Yes Yes Yes No Yes No Yes What is the decision tree? [2+, 3-] [9+, 5-] Outlook Sunny Overcast [4+, 0-] Machine Learning Rain [3+, 2-] Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24 [2

80 known Attribute Values Missing Data Training: evaluate Gain (D, A) where for some x D, a value Testing: classify a new example x without knowing the value s Missing! Solutions: Values of Incorporating A? a Guess into Calculation of G Problem: What If Some Examples Missing Values of A? butes Consider during dataset. training or testing Day Outlook Temperature Humidity Wind PlayTennis? 1 Sunny Hot High Light No 2 Sunny Hot High Strong No 3 Overcast Hot High Light Yes 4 Rain Mild High Light Yes 5 Rain Cool Normal Light Yes 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 8 Sunny Mild??? Light No 9 Sunny Cool Normal Light Yes 10 Rain Mild Normal Light Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 13 Overcast Hot Normal Light Yes 14 Rain Mild High Strong No ormal,, Blood-Test =?, >, sometimes low priority (or cost too high) ssification ere for some x D, a value for A is not given without knowing the value of A into Calculation of Gain(D, A) Wind Light Strong Light Light Light Strong Strong Light Light PlayTennis? No No Yes Yes Yes No Yes No Yes What is the decision tree? [2+, 3-] [9+, 5-] Outlook Sunny Overcast [4+, 0-] Machine Learning Rain [3+, 2-] Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24 [2

81 Attributes with Many Values Problem: If attribute has many values such as Date, Gain() will select it (why?) Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

82 Attributes with Many Values Problem: If attribute has many values such as Date, Gain() will select it (why?) One Approach: Use GainRatio instead of Gain Gain(D, A) = H(D) GainRatio(D, A) = SplitInformation(D, A) = v values(a) [ ] Dv D H(D v ) Gain(D, A) SplitInformation(D, A) [ Dv D log D ] v D v values(a) Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

83 Attributes with Many Values Problem: If attribute has many values such as Date, Gain() will select it (why?) One Approach: Use GainRatio instead of Gain Gain(D, A) = H(D) GainRatio(D, A) = SplitInformation(D, A) = v values(a) [ ] Dv D H(D v ) Gain(D, A) SplitInformation(D, A) [ Dv D log D ] v D v values(a) SplitInformation: directly proportional to values(a), i.e., penalizes attributes with more values. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

84 Attributes with Many Values Problem: If attribute has many values such as Date, Gain() will select it (why?) One Approach: Use GainRatio instead of Gain Gain(D, A) = H(D) GainRatio(D, A) = SplitInformation(D, A) = v values(a) [ ] Dv D H(D v ) Gain(D, A) SplitInformation(D, A) [ Dv D log D ] v D v values(a) SplitInformation: directly proportional to values(a), i.e., penalizes attributes with more values. What is its inductive bias? Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

85 Attributes with Many Values Problem: If attribute has many values such as Date, Gain() will select it (why?) One Approach: Use GainRatio instead of Gain Gain(D, A) = H(D) GainRatio(D, A) = SplitInformation(D, A) = v values(a) [ ] Dv D H(D v ) Gain(D, A) SplitInformation(D, A) [ Dv D log D ] v D v values(a) SplitInformation: directly proportional to values(a), i.e., penalizes attributes with more values. What is its inductive bias? Preference bias (for lower branch factor) expressed via GainRatio(.) Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

86 Attributes with Many Values Problem: If attribute has many values such as Date, Gain() will select it (why?) One Approach: Use GainRatio instead of Gain Gain(D, A) = H(D) GainRatio(D, A) = SplitInformation(D, A) = v values(a) [ ] Dv D H(D v ) Gain(D, A) SplitInformation(D, A) [ Dv D log D ] v D v values(a) SplitInformation: directly proportional to values(a), i.e., penalizes attributes with more values. What is its inductive bias? Preference bias (for lower branch factor) expressed via GainRatio(.) Alternative attribute selection : Gini Index Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

87 Handling Attributes With Different Costs Problem: In some learning tasks the instance attributes may have associated costs. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

88 Handling Attributes With Different Costs Problem: In some learning tasks the instance attributes may have associated costs. Solutions Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

89 Handling Attributes With Different Costs Problem: In some learning tasks the instance attributes may have associated costs. Solutions 1 ExtendedID3 Gain(S, A) Cost(A) Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

90 Handling Attributes With Different Costs Problem: In some learning tasks the instance attributes may have associated costs. Solutions 1 ExtendedID3 2 TanandSchlimmer Gain(S, A) Cost(A) Gain 2 (S, A) Cost(A) Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

91 Handling Attributes With Different Costs Problem: In some learning tasks the instance attributes may have associated costs. Solutions 1 ExtendedID3 2 TanandSchlimmer 3 Nunez Gain(S, A) Cost(A) Gain 2 (S, A) Cost(A) where w [0, 1] is a constant. 2 Gain(S,A) 1 (Cost(A) + 1) w Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

92 t Regression that appropriate Tree impurity measure for regression is subset of X reaching node m. on tree, the goodness of a split is measured by the m the estimated value. In regression tree, the goodness of a split is measured by the mean square error from the estimated value. A 1 True F 1 (A-A 1 )] False F[11+, 2 (A-A2-] 1 )] an, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classifica ion Trees, Belmont, CA; Wadsworth International Group, (1 ba, F. Esposito, M. Ceci, and A. Appice, Top-Down Induction ression and Splitting Nodes, IEEE Trans. on Pattern Analysis Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

93 t Regression that appropriate Tree impurity measure for regression is subset of X reaching node m. on tree, the goodness of a split is measured by the m the estimated value. In regression tree, the goodness of a split is measured by the mean square error from the estimated value. A 1 True F 1 (A-A 1 )] False F[11+, 2 (A-A2-] 1 )] References an, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classifica ion Trees, Belmont, CA; Wadsworth International Group, (1 ba, F. Esposito, M. Ceci, and A. Appice, Top-Down Induction ression and Splitting Nodes, IEEE Trans. on Pattern Analysis Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

94 t Regression that appropriate Tree impurity measure for regression is subset of X reaching node m. on tree, the goodness of a split is measured by the m the estimated value. In regression tree, the goodness of a split is measured by the mean square error from the estimated value. A 1 True F 1 (A-A 1 )] False F[11+, 2 (A-A2-] 1 )] References 1 L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees, Belmont, CA; Wadsworth International Group, (1984). an, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classifica ion Trees, Belmont, CA; Wadsworth International Group, (1 ba, F. Esposito, M. Ceci, and A. Appice, Top-Down Induction ression and Splitting Nodes, IEEE Trans. on Pattern Analysis Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

95 t Regression that appropriate Tree impurity measure for regression is subset of X reaching node m. on tree, the goodness of a split is measured by the m the estimated value. In regression tree, the goodness of a split is measured by the mean square error from the estimated value. A 1 True F 1 (A-A 1 )] False F[11+, 2 (A-A2-] 1 )] References 1 L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees, Belmont, CA; Wadsworth International Group, (1984). an, J. 2 H. D. Friedman, Malerba, F. Esposito, R. M. A. Ceci, Olshen, and A. Appice, and Top-Down C. J. Stone, Induction of Classifica Model Trees ion Trees, with Regression Belmont, and Splitting CA; Wadsworth Nodes, IEEE Trans. International Pattern Analysis and Group, Machine (1 Intelligence, Vol. 25, No. 5, pp , May ba, F. Esposito, M. Ceci, and A. Appice, Top-Down Induction ression and Splitting Nodes, IEEE Trans. on Pattern Analysis Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

96 Other types of decision trees Univariate trees In univariate trees, the test at each internal node just uses only one of input attributes. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

97 Other types of decision trees Univariate trees In univariate trees, the test at each internal node just uses only one of input attributes. Multivariate trees In multivariate trees, the test at each internal node can use all input attributes. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

98 Other types of decision trees Univariate trees In univariate trees, the test at each internal node just uses only one of input attributes. Multivariate trees In multivariate trees, the test at each internal node can use all input attributes. Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

99 Other types of decision trees Decision Trees 26! Univariate Trees Univariate " In univariate trees trees, the test at each internal node just uses only one of input In attributes. univariate trees, the test at each internal node just uses only one of input! Multivariate attributes. Trees " In multivariate trees, the test at each internal node can use all input attributes. Multivariate trees " For example: Consider a data set with numerical attributes. In multivariate trees, the test at each internal node can use all input attributes. # The test can be made using the weighted linear combination of some input attributes. " For example X=(x 1,x 2 ) be the input attributes. Let f (X)=w 0 +w 1 x 1 +w 2 x 2 can be used for test at an internal node. Such as f (x) > 0. (w 0 +w 1 x 1 +w 2 x 2 )>0 True False YES [11+, NO2-]! Reading " C. E. Brodley and P. E. Utgoff, Multivariate Decision Trees, Machine Learning, Vol. 19, pp , Machine Learning Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

100 Other types of decision trees Decision Trees ariate Trees univariate trees, the test at each internal node just uses only one of input 26 ttributes.! Univariate Trees ivariate Univariate " Trees In univariate trees trees, the test at each internal node just uses only one of input multivariate In attributes. univariate trees, the trees, test the at test each atinternal each internal node can nodeuse justall uses input onlyattributes. one of input or! example: Multivariate attributes. Consider Treesa data set with numerical attributes. # The " test In can multivariate be made trees, using the the test weighted at each linear internal combination node can use of all some input input attributes. attributes. Multivariate trees or example " For X=(x example: Consider a data set with numerical attributes. In multivariate 1,x 2 ) be the input attributes. Let f (X)=w # The test can trees, be made the using test the at weighted each internal linear combination node 0 +w can 1 x of some use 1 +w 2 x all 2 can be used for input input attributes. st at an internal node. Such as f (x) > 0. attributes. " For example X=(x 1,x 2 ) be the input attributes. Let f (X)=w 0 +w 1 x 1 +w 2 x 2 can be used for test at an internal node. Such as f (x) > 0. (w 0 +w 1 x 1 +w 2 x 2 )>0 (w 0 +w 1 x 1 +w 2 x 2 )>0 Decision Trees True True False False YES YES [11+, NO2-] [11+, NO2-] ing! Reading. E. Brodley " C. E. and Brodley P. E. and Utgoff, P. E. Utgoff, Multivariate Multivariate Decision Trees, Machine Learning, Vol. Vol. 19, 19, p , pp , Machine Learning Machine Learning Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

101 Other types of decision trees Decision Trees ariate Trees univariate trees, the test at each internal node just uses only one of input 26 ttributes.! Univariate Trees ivariate Univariate " Trees In univariate trees trees, the test at each internal node just uses only one of input multivariate In attributes. univariate trees, the trees, test the at test each atinternal each internal node can nodeuse justall uses input onlyattributes. one of input or! example: Multivariate attributes. Consider Treesa data set with numerical attributes. # The " test In can multivariate be made trees, using the the test weighted at each linear internal combination node can use of all some input input attributes. attributes. Multivariate trees or example " For X=(x example: Consider a data set with numerical attributes. In multivariate 1,x 2 ) be the input attributes. Let f (X)=w # The test can trees, be made the using test the at weighted each internal linear combination node 0 +w can 1 x of some use 1 +w 2 x all 2 can be used for input input attributes. st at an internal node. Such as f (x) > 0. attributes. " For example X=(x 1,x 2 ) be the input attributes. Let f (X)=w 0 +w 1 x 1 +w 2 x 2 can be used for test at an internal node. Such as f (x) > 0. (w 0 +w 1 x 1 +w 2 x 2 )>0 (w 0 +w 1 x 1 +w 2 x 2 )>0 Decision Trees True True False False YES YES [11+, NO2-] [11+, NO2-] ing! Reading. E. Brodley " C. E. Brodley and P. E. Utgoff, Multivariate Vol. 19, References and P. E. Utgoff, Multivariate Decision Trees, Machine Learning, Vol. 19, p , pp , Machine Learning Machine Learning Hamid Beigy (Sharif University of Technology) Decision Tree Fall / 24

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

(Sub)Gradient Descent

(Sub)Gradient Descent (Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees Mariusz Łapczy ski 1 and Bartłomiej Jefma ski 2 1 The Chair of Market Analysis and Marketing Research,

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina

More information

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and Planning Overview Motivation for Analyses Analyses and

More information

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic

More information

A Version Space Approach to Learning Context-free Grammars

A Version Space Approach to Learning Context-free Grammars Machine Learning 2: 39~74, 1987 1987 Kluwer Academic Publishers, Boston - Manufactured in The Netherlands A Version Space Approach to Learning Context-free Grammars KURT VANLEHN (VANLEHN@A.PSY.CMU.EDU)

More information

Grade 6: Correlated to AGS Basic Math Skills

Grade 6: Correlated to AGS Basic Math Skills Grade 6: Correlated to AGS Basic Math Skills Grade 6: Standard 1 Number Sense Students compare and order positive and negative integers, decimals, fractions, and mixed numbers. They find multiples and

More information

Data Stream Processing and Analytics

Data Stream Processing and Analytics Data Stream Processing and Analytics Vincent Lemaire Thank to Alexis Bondu, EDF Outline Introduction on data-streams Supervised Learning Conclusion 2 3 Big Data what does that mean? Big Data Analytics?

More information

Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Basic Concepts of Machine Learning Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010

More information

Chapter 2 Rule Learning in a Nutshell

Chapter 2 Rule Learning in a Nutshell Chapter 2 Rule Learning in a Nutshell This chapter gives a brief overview of inductive rule learning and may therefore serve as a guide through the rest of the book. Later chapters will expand upon the

More information

Learning goal-oriented strategies in problem solving

Learning goal-oriented strategies in problem solving Learning goal-oriented strategies in problem solving Martin Možina, Timotej Lazar, Ivan Bratko Faculty of Computer and Information Science University of Ljubljana, Ljubljana, Slovenia Abstract The need

More information

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems Ajith Abraham School of Business Systems, Monash University, Clayton, Victoria 3800, Australia. Email: ajith.abraham@ieee.org

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

A Reinforcement Learning Variant for Control Scheduling

A Reinforcement Learning Variant for Control Scheduling A Reinforcement Learning Variant for Control Scheduling Aloke Guha Honeywell Sensor and System Development Center 3660 Technology Drive Minneapolis MN 55417 Abstract We present an algorithm based on reinforcement

More information

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

STA 225: Introductory Statistics (CT)

STA 225: Introductory Statistics (CT) Marshall University College of Science Mathematics Department STA 225: Introductory Statistics (CT) Course catalog description A critical thinking course in applied statistical reasoning covering basic

More information

Proof Theory for Syntacticians

Proof Theory for Syntacticians Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

CS 101 Computer Science I Fall Instructor Muller. Syllabus

CS 101 Computer Science I Fall Instructor Muller. Syllabus CS 101 Computer Science I Fall 2013 Instructor Muller Syllabus Welcome to CS101. This course is an introduction to the art and science of computer programming and to some of the fundamental concepts of

More information

GACE Computer Science Assessment Test at a Glance

GACE Computer Science Assessment Test at a Glance GACE Computer Science Assessment Test at a Glance Updated May 2017 See the GACE Computer Science Assessment Study Companion for practice questions and preparation resources. Assessment Name Computer Science

More information

Probability and Statistics Curriculum Pacing Guide

Probability and Statistics Curriculum Pacing Guide Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Ch 2 Test Remediation Work Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide an appropriate response. 1) High temperatures in a certain

More information

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus CS 1103 Computer Science I Honors Fall 2016 Instructor Muller Syllabus Welcome to CS1103. This course is an introduction to the art and science of computer programming and to some of the fundamental concepts

More information

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,

More information

CSL465/603 - Machine Learning

CSL465/603 - Machine Learning CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am

More information

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1 Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

Multi-label Classification via Multi-target Regression on Data Streams

Multi-label Classification via Multi-target Regression on Data Streams Multi-label Classification via Multi-target Regression on Data Streams Aljaž Osojnik 1,2, Panče Panov 1, and Sašo Džeroski 1,2,3 1 Jožef Stefan Institute, Jamova cesta 39, Ljubljana, Slovenia 2 Jožef Stefan

More information

Constructive Induction-based Learning Agents: An Architecture and Preliminary Experiments

Constructive Induction-based Learning Agents: An Architecture and Preliminary Experiments Proceedings of the First International Workshop on Intelligent Adaptive Systems (IAS-95) Ibrahim F. Imam and Janusz Wnek (Eds.), pp. 38-51, Melbourne Beach, Florida, 1995. Constructive Induction-based

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

Calibration of Confidence Measures in Speech Recognition

Calibration of Confidence Measures in Speech Recognition Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE

More information

Applications of data mining algorithms to analysis of medical data

Applications of data mining algorithms to analysis of medical data Master Thesis Software Engineering Thesis no: MSE-2007:20 August 2007 Applications of data mining algorithms to analysis of medical data Dariusz Matyja School of Engineering Blekinge Institute of Technology

More information

Softprop: Softmax Neural Network Backpropagation Learning

Softprop: Softmax Neural Network Backpropagation Learning Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science

More information

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S N S ER E P S I M TA S UN A I S I T VER RANKING AND UNRANKING LEFT SZILARD LANGUAGES Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A-1997-2 UNIVERSITY OF TAMPERE DEPARTMENT OF

More information

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE Edexcel GCSE Statistics 1389 Paper 1H June 2007 Mark Scheme Edexcel GCSE Statistics 1389 NOTES ON MARKING PRINCIPLES 1 Types of mark M marks: method marks A marks: accuracy marks B marks: unconditional

More information

Multi-label classification via multi-target regression on data streams

Multi-label classification via multi-target regression on data streams Mach Learn (2017) 106:745 770 DOI 10.1007/s10994-016-5613-5 Multi-label classification via multi-target regression on data streams Aljaž Osojnik 1,2 Panče Panov 1 Sašo Džeroski 1,2,3 Received: 26 April

More information

Cooperative evolutive concept learning: an empirical study

Cooperative evolutive concept learning: an empirical study Cooperative evolutive concept learning: an empirical study Filippo Neri University of Piemonte Orientale Dipartimento di Scienze e Tecnologie Avanzate Piazza Ambrosoli 5, 15100 Alessandria AL, Italy Abstract

More information

Semi-Supervised Face Detection

Semi-Supervised Face Detection Semi-Supervised Face Detection Nicu Sebe, Ira Cohen 2, Thomas S. Huang 3, Theo Gevers Faculty of Science, University of Amsterdam, The Netherlands 2 HP Research Labs, USA 3 Beckman Institute, University

More information

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Active Learning. Yingyu Liang Computer Sciences 760 Fall Active Learning Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Mark Craven,

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

Using focal point learning to improve human machine tacit coordination

Using focal point learning to improve human machine tacit coordination DOI 10.1007/s10458-010-9126-5 Using focal point learning to improve human machine tacit coordination InonZuckerman SaritKraus Jeffrey S. Rosenschein The Author(s) 2010 Abstract We consider an automated

More information

A. What is research? B. Types of research

A. What is research? B. Types of research A. What is research? Research = the process of finding solutions to a problem after a thorough study and analysis (Sekaran, 2006). Research = systematic inquiry that provides information to guide decision

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C Using and applying mathematics objectives (Problem solving, Communicating and Reasoning) Select the maths to use in some classroom

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Lab 1 - The Scientific Method

Lab 1 - The Scientific Method Lab 1 - The Scientific Method As Biologists we are interested in learning more about life. Through observations of the living world we often develop questions about various phenomena occurring around us.

More information

Axiom 2013 Team Description Paper

Axiom 2013 Team Description Paper Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

INPE São José dos Campos

INPE São José dos Campos INPE-5479 PRE/1778 MONLINEAR ASPECTS OF DATA INTEGRATION FOR LAND COVER CLASSIFICATION IN A NEDRAL NETWORK ENVIRONNENT Maria Suelena S. Barros Valter Rodrigues INPE São José dos Campos 1993 SECRETARIA

More information

NORTH CAROLINA VIRTUAL PUBLIC SCHOOL IN WCPSS UPDATE FOR FALL 2007, SPRING 2008, AND SUMMER 2008

NORTH CAROLINA VIRTUAL PUBLIC SCHOOL IN WCPSS UPDATE FOR FALL 2007, SPRING 2008, AND SUMMER 2008 E&R Report No. 08.29 February 2009 NORTH CAROLINA VIRTUAL PUBLIC SCHOOL IN WCPSS UPDATE FOR FALL 2007, SPRING 2008, AND SUMMER 2008 Authors: Dina Bulgakov-Cooke, Ph.D., and Nancy Baenen ABSTRACT North

More information

Multi-Lingual Text Leveling

Multi-Lingual Text Leveling Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency

More information

AP Calculus AB. Nevada Academic Standards that are assessable at the local level only.

AP Calculus AB. Nevada Academic Standards that are assessable at the local level only. Calculus AB Priority Keys Aligned with Nevada Standards MA I MI L S MA represents a Major content area. Any concept labeled MA is something of central importance to the entire class/curriculum; it is a

More information

Radius STEM Readiness TM

Radius STEM Readiness TM Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and

More information

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language Nathaniel Hayes Department of Computer Science Simpson College 701 N. C. St. Indianola, IA, 50125 nate.hayes@my.simpson.edu

More information

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH ISSN: 0976-3104 Danti and Bhushan. ARTICLE OPEN ACCESS CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH Ajit Danti 1 and SN Bharath Bhushan 2* 1 Department

More information

Measurement. When Smaller Is Better. Activity:

Measurement. When Smaller Is Better. Activity: Measurement Activity: TEKS: When Smaller Is Better (6.8) Measurement. The student solves application problems involving estimation and measurement of length, area, time, temperature, volume, weight, and

More information

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18 Version Space Javier Béjar cbea LSI - FIB Term 2012/2013 Javier Béjar cbea (LSI - FIB) Version Space Term 2012/2013 1 / 18 Outline 1 Learning logical formulas 2 Version space Introduction Search strategy

More information

Introduction to Questionnaire Design

Introduction to Questionnaire Design Introduction to Questionnaire Design Why this seminar is necessary! Bad questions are everywhere! Don t let them happen to you! Fall 2012 Seminar Series University of Illinois www.srl.uic.edu The first

More information

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called Improving Simple Bayes Ron Kohavi Barry Becker Dan Sommereld Data Mining and Visualization Group Silicon Graphics, Inc. 2011 N. Shoreline Blvd. Mountain View, CA 94043 fbecker,ronnyk,sommdag@engr.sgi.com

More information

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS L. Descalço 1, Paula Carvalho 1, J.P. Cruz 1, Paula Oliveira 1, Dina Seabra 2 1 Departamento de Matemática, Universidade de Aveiro (PORTUGAL)

More information

Model Ensemble for Click Prediction in Bing Search Ads

Model Ensemble for Click Prediction in Bing Search Ads Model Ensemble for Click Prediction in Bing Search Ads Xiaoliang Ling Microsoft Bing xiaoling@microsoft.com Hucheng Zhou Microsoft Research huzho@microsoft.com Weiwei Deng Microsoft Bing dedeng@microsoft.com

More information

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Neural Network GUI Tested on Text-To-Phoneme Mapping A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis

More information

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The Good Judgment Project: A large scale test of different methods of combining expert predictions The Good Judgment Project: A large scale test of different methods of combining expert predictions Lyle Ungar, Barb Mellors, Jon Baron, Phil Tetlock, Jaime Ramos, Sam Swift The University of Pennsylvania

More information

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation School of Computer Science Human-Computer Interaction Institute Carnegie Mellon University Year 2007 Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation Noboru Matsuda

More information

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website Sociology 521: Social Statistics and Quantitative Methods I Spring 2012 Wed. 2 5, Kap 305 Computer Lab Instructor: Tim Biblarz Office hours (Kap 352): W, 5 6pm, F, 10 11, and by appointment (213) 740 3547;

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

Math 96: Intermediate Algebra in Context

Math 96: Intermediate Algebra in Context : Intermediate Algebra in Context Syllabus Spring Quarter 2016 Daily, 9:20 10:30am Instructor: Lauri Lindberg Office Hours@ tutoring: Tutoring Center (CAS-504) 8 9am & 1 2pm daily STEM (Math) Center (RAI-338)

More information

SARDNET: A Self-Organizing Feature Map for Sequences

SARDNET: A Self-Organizing Feature Map for Sequences SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu

More information

Visit us at:

Visit us at: White Paper Integrating Six Sigma and Software Testing Process for Removal of Wastage & Optimizing Resource Utilization 24 October 2013 With resources working for extended hours and in a pressurized environment,

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de

More information

Getting Started with TI-Nspire High School Science

Getting Started with TI-Nspire High School Science Getting Started with TI-Nspire High School Science 2012 Texas Instruments Incorporated Materials for Institute Participant * *This material is for the personal use of T3 instructors in delivering a T3

More information

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point. STT 231 Test 1 Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point. 1. A professor has kept records on grades that students have earned in his class. If he

More information

Alignment of Australian Curriculum Year Levels to the Scope and Sequence of Math-U-See Program

Alignment of Australian Curriculum Year Levels to the Scope and Sequence of Math-U-See Program Alignment of s to the Scope and Sequence of Math-U-See Program This table provides guidance to educators when aligning levels/resources to the Australian Curriculum (AC). The Math-U-See levels do not address

More information

Multivariate k-nearest Neighbor Regression for Time Series data -

Multivariate k-nearest Neighbor Regression for Time Series data - Multivariate k-nearest Neighbor Regression for Time Series data - a novel Algorithm for Forecasting UK Electricity Demand ISF 2013, Seoul, Korea Fahad H. Al-Qahtani Dr. Sven F. Crone Management Science,

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

12- A whirlwind tour of statistics

12- A whirlwind tour of statistics CyLab HT 05-436 / 05-836 / 08-534 / 08-734 / 19-534 / 19-734 Usable Privacy and Security TP :// C DU February 22, 2016 y & Secu rivac rity P le ratory bo La Lujo Bauer, Nicolas Christin, and Abby Marsh

More information

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders

More information

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com

More information

Detecting English-French Cognates Using Orthographic Edit Distance

Detecting English-French Cognates Using Orthographic Edit Distance Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

Mining Association Rules in Student s Assessment Data

Mining Association Rules in Student s Assessment Data www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama

More information

TOPICS LEARNING OUTCOMES ACTIVITES ASSESSMENT Numbers and the number system

TOPICS LEARNING OUTCOMES ACTIVITES ASSESSMENT Numbers and the number system Curriculum Overview Mathematics 1 st term 5º grade - 2010 TOPICS LEARNING OUTCOMES ACTIVITES ASSESSMENT Numbers and the number system Multiplies and divides decimals by 10 or 100. Multiplies and divide

More information

FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS

FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS South African Journal of Industrial Engineering August 2017 Vol 28(2), pp 59-77 FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS R. Steynberg 1 * #,

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations Katarzyna Stapor (B) Institute of Computer Science, Silesian Technical University, Gliwice, Poland katarzyna.stapor@polsl.pl

More information

A Bootstrapping Model of Frequency and Context Effects in Word Learning

A Bootstrapping Model of Frequency and Context Effects in Word Learning Cognitive Science 41 (2017) 590 622 Copyright 2016 Cognitive Science Society, Inc. All rights reserved. ISSN: 0364-0213 print / 1551-6709 online DOI: 10.1111/cogs.12353 A Bootstrapping Model of Frequency

More information