Accuracy of Decision Trees. Overview. Entropy and Information Gain. Choosing the Best Attribute to Test First. Decision tree learning wrap up

Overview Accuracy of Decision Trees 1 Decision tree learning wrap up Final exam review Final exam: Monday, May 6th at 10:30am until 12:30pm in Rm. 126 HRBB. % correct on test set 0.9 0.8 0.7 0.6 0.5 0.4 0 20 40 60 80 100 Training set size Divide examples into training and test sets. Train using the training set. 1 Measure accuracy of resulting decision tree on the test set. 2 Choosing the Best Attribute to Test First Entropy and Information Gain Use Shannon s information theory to choose the attribute that give the maximum information gain. Entropy(E) = X i2c P i log 2 (P i ) Pick an attribute such that the information gain (or entropy reduction) is maximized. Gain(E; A) = Entropy(E) X alues(a) v2v v j je Entropy(E v) jej Entropy measures the average surprisal of events. Less probable events are more surprising. E: set of examples A: a single attribute E v : set of examples where attribute A = v. jsj : cardinality of set S. 3 4

Issues in Decision Tree Learning Key Points Noise and overfitting Missing attribute values from examples Multivalued attribues with large number of possible values Continuous-valued attributes. Decision tree learning: What is the embodied principle (or bias)? How to choose the best attribute? Given a set of examples, choose the best attribute to test first. What are the issues? noise, overfitting, etc. 5 6 Final Exam Review Key Points: slide17 Predicate calculus (first-order logic): 30 points Representing relations in predicate calculus: domains, Probabilistic inference (including belief networks): 30 points Learning (including neural networks, GA, and decision trees): 40 points My research material (perceptual grouping) will not be on the exam. No Lisp programming problems. Interpretation in predicate calculus: what is an interpretation and how it related to a domain. When is an interpretation true or false. prenex normal form: why it is useful, how to convert to, the basic rules used in conversion skolemization: why it is useful, how to do it inference: basics of resolution first step is converting to a standard form. 7 8

Key Points: slide18 Key Points: slide19 skolemiztion: why it is useful and how to do it. unification algorithm substitution and unification: why are these necessary and how to do them. factors : definition, and how to derive, why factors are important resolvent : definition, and how to derive 9 10 Key Points: slide20 Key Points: slide21 unification algorithm resolvent : definition, and how to derive factors : definition, and how to derive, why factors are important properties of resolution: sound and complete resolvent : definition, and how to derive theorem proving algorithm: level saturation (two pointer method) theorm proving: strategies for efficient resolution advantages and disadvantages of resolution. 11 12

Key Points: slide23 Key Points: slide24 Application of theorem prover: how to use it to answer questions. Uncertainty: why first-order logic can fail in uncertain domains? Decision theory example: how prob theory and decision theory are combined Probability basics: terminology, notations. Decision theory basics. Joint probability distribution: concept Conditional probability: definition, various ways of representing conditional prob. Axioms of probability: basic axioms, and using them to prove simple equalities. Bayes rule: definition and application. 13 14 Key Points: slide25 Key Points: slide26 Why and when is Bayesian analysis useful? How is subjective belief utilized in Bayesian analysis? How to calculate priors from conditional distributions? How is subjective belief utilized in Bayesian analysis? Bayesian updating: why does it make probabilistic inference efficient when multiple evidence comes in? Belief network: definition, semantics, extracting probabilities of certain conjunction of events. 15 16

Key Points: slide27 Key Points: slide28 Constructing a belief network: what is the procedure? why does node ordering matter? how to order the nodes? Inference in belief networks: what are the kinds of inference? what is the general method? Knowledge engineering: how to formulate the idea and design a system. Types of learning Neural networks: basics The central nervous system: how it differs from conventional computers. 17 18 Key Points: slide29 Key Points: slide30 The central nervous system: how it differs from conventional computers. Basic mechanism of synaptic information transfer Types of neural networks Basic concept of a multi-layer feed-forward network. How hidden units know how much error they caused. Backprop is a gradient descent algorithm. Drawbacks of backprop. Perceptrons: basic idea, and the geometric interpretation. What is the limitation? How to train? 19 20

Key Points: slide31 Key Points: slide32 How can backprop be improved? SOM basic algorithm What are the various ways to apply backprop? What kind of tasks is SOM good for? SOM basic algorithm What kind of tasks is SOM good for? Simple recurrent networks: how can it encode sequences, how is it different from standard backprop and how similar are they? 21 22 Key Points: slide33 Key Points: slide34 Simple recurrent networks: how can it encode sequences, how is it different from standard backprop and who similar is it? Genetic algorithms basics. What are the issues to be solved in genetic algorithms. Genetic algorithms basics. 23 24

Key Points: slide37 Next Time Decision tree learning: What is the embodied principle (or bias)? How to choose the best attribute? Given a set of examples, choose the best attribute to test first. Tuesday: redefined day general Q and A (attendance not required). Recommended reading in AI, neuroscience, cognitive science, philosophy, etc. What are the issues? noise, overfitting, etc. 25 26