Accuracy of Decision Trees. Overview. Entropy and Information Gain. Choosing the Best Attribute to Test First. Decision tree learning wrap up

Similar documents
Lecture 1: Machine Learning Basics

Artificial Neural Networks written examination

Rule-based Expert Systems

Knowledge-Based - Systems

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Python Machine Learning

(Sub)Gradient Descent

Lecture 1: Basic Concepts of Machine Learning

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

Probabilistic Latent Semantic Analysis

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Proof Theory for Syntacticians

Truth Inference in Crowdsourcing: Is the Problem Solved?

CSL465/603 - Machine Learning

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Radius STEM Readiness TM

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Learning to Schedule Straight-Line Code

Softprop: Softmax Neural Network Backpropagation Learning

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Calibration of Confidence Measures in Speech Recognition

INTRODUCTION TO DECISION ANALYSIS (Economics ) Prof. Klaus Nehring Spring Syllabus

Computerized Adaptive Psychological Testing A Personalisation Perspective

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

SARDNET: A Self-Organizing Feature Map for Sequences

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Speech Emotion Recognition Using Support Vector Machine

Evolution of Collective Commitment during Teamwork

A Version Space Approach to Learning Context-free Grammars

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Math 181, Calculus I

Semi-Supervised Face Detection

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

arxiv: v1 [cs.cv] 10 May 2017

Indian Institute of Technology, Kanpur

The Conversational User Interface

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Universidade do Minho Escola de Engenharia

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

COMPUTER SCIENCE GRADUATE STUDIES Course Descriptions by Methodology

stateorvalue to each variable in a given set. We use p(x = xjy = y) (or p(xjy) as a shorthand) to denote the probability that X = x given Y = y. We al

Word learning as Bayesian inference

Abstractions and the Brain

COMPUTER SCIENCE GRADUATE STUDIES Course Descriptions by Research Area

Device Independence and Extensibility in Gesture Recognition

Evolutive Neural Net Fuzzy Filtering: Basic Description

A Genetic Irrational Belief System

Model Ensemble for Click Prediction in Bing Search Ads

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

AQUA: An Ontology-Driven Question Answering System

Launching GO 4 Schools as a whole school approach

Integrating E-learning Environments with Computational Intelligence Assessment Agents

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Speaker Identification by Comparison of Smart Methods. Abstract

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

CS Machine Learning

Learning Methods for Fuzzy Systems

Kamaldeep Kaur University School of Information Technology GGS Indraprastha University Delhi

Artificial Neural Networks

A NEW ALGORITHM FOR GENERATION OF DECISION TREES

Axiom 2013 Team Description Paper

Visual CP Representation of Knowledge

Seminar - Organic Computing

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

Planning with External Events

FIGURE IT OUT! MIDDLE SCHOOL TASKS. Texas Performance Standards Project

Using dialogue context to improve parsing performance in dialogue systems

Finding truth even if the crowd is wrong

MYCIN. The embodiment of all the clichés of what expert systems are. (Newell)

Australian Journal of Basic and Applied Sciences

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Toward Probabilistic Natural Logic for Syllogistic Reasoning

Mathematics subject curriculum

Chapter 2 Rule Learning in a Nutshell

Applications of data mining algorithms to analysis of medical data

Rule Learning With Negation: Issues Regarding Effectiveness

Exploration. CS : Deep Reinforcement Learning Sergey Levine

A Bayesian Model of Imitation in Infants and Robots

On-Line Data Analytics

Learning Methods in Multilingual Speech Recognition

Word Segmentation of Off-line Handwritten Documents


Pre-Processing MRSes

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

COSI Meet the Majors Fall 17. Prof. Mitch Cherniack Undergraduate Advising Head (UAH), COSI Fall '17: Instructor COSI 29a

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

BUSINESS INTELLIGENCE FROM WEB USAGE MINING

A Case Study: News Classification Based on Term Frequency

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Transcription:

Overview Accuracy of Decision Trees 1 Decision tree learning wrap up Final exam review Final exam: Monday, May 6th at 10:30am until 12:30pm in Rm. 126 HRBB. % correct on test set 0.9 0.8 0.7 0.6 0.5 0.4 0 20 40 60 80 100 Training set size Divide examples into training and test sets. Train using the training set. 1 Measure accuracy of resulting decision tree on the test set. 2 Choosing the Best Attribute to Test First Entropy and Information Gain Use Shannon s information theory to choose the attribute that give the maximum information gain. Entropy(E) = X i2c P i log 2 (P i ) Pick an attribute such that the information gain (or entropy reduction) is maximized. Gain(E; A) = Entropy(E) X alues(a) v2v v j je Entropy(E v) jej Entropy measures the average surprisal of events. Less probable events are more surprising. E: set of examples A: a single attribute E v : set of examples where attribute A = v. jsj : cardinality of set S. 3 4

Issues in Decision Tree Learning Key Points Noise and overfitting Missing attribute values from examples Multivalued attribues with large number of possible values Continuous-valued attributes. Decision tree learning: What is the embodied principle (or bias)? How to choose the best attribute? Given a set of examples, choose the best attribute to test first. What are the issues? noise, overfitting, etc. 5 6 Final Exam Review Key Points: slide17 Predicate calculus (first-order logic): 30 points Representing relations in predicate calculus: domains, Probabilistic inference (including belief networks): 30 points Learning (including neural networks, GA, and decision trees): 40 points My research material (perceptual grouping) will not be on the exam. No Lisp programming problems. Interpretation in predicate calculus: what is an interpretation and how it related to a domain. When is an interpretation true or false. prenex normal form: why it is useful, how to convert to, the basic rules used in conversion skolemization: why it is useful, how to do it inference: basics of resolution first step is converting to a standard form. 7 8

Key Points: slide18 Key Points: slide19 skolemiztion: why it is useful and how to do it. unification algorithm substitution and unification: why are these necessary and how to do them. factors : definition, and how to derive, why factors are important resolvent : definition, and how to derive 9 10 Key Points: slide20 Key Points: slide21 unification algorithm resolvent : definition, and how to derive factors : definition, and how to derive, why factors are important properties of resolution: sound and complete resolvent : definition, and how to derive theorem proving algorithm: level saturation (two pointer method) theorm proving: strategies for efficient resolution advantages and disadvantages of resolution. 11 12

Key Points: slide23 Key Points: slide24 Application of theorem prover: how to use it to answer questions. Uncertainty: why first-order logic can fail in uncertain domains? Decision theory example: how prob theory and decision theory are combined Probability basics: terminology, notations. Decision theory basics. Joint probability distribution: concept Conditional probability: definition, various ways of representing conditional prob. Axioms of probability: basic axioms, and using them to prove simple equalities. Bayes rule: definition and application. 13 14 Key Points: slide25 Key Points: slide26 Why and when is Bayesian analysis useful? How is subjective belief utilized in Bayesian analysis? How to calculate priors from conditional distributions? How is subjective belief utilized in Bayesian analysis? Bayesian updating: why does it make probabilistic inference efficient when multiple evidence comes in? Belief network: definition, semantics, extracting probabilities of certain conjunction of events. 15 16

Key Points: slide27 Key Points: slide28 Constructing a belief network: what is the procedure? why does node ordering matter? how to order the nodes? Inference in belief networks: what are the kinds of inference? what is the general method? Knowledge engineering: how to formulate the idea and design a system. Types of learning Neural networks: basics The central nervous system: how it differs from conventional computers. 17 18 Key Points: slide29 Key Points: slide30 The central nervous system: how it differs from conventional computers. Basic mechanism of synaptic information transfer Types of neural networks Basic concept of a multi-layer feed-forward network. How hidden units know how much error they caused. Backprop is a gradient descent algorithm. Drawbacks of backprop. Perceptrons: basic idea, and the geometric interpretation. What is the limitation? How to train? 19 20

Key Points: slide31 Key Points: slide32 How can backprop be improved? SOM basic algorithm What are the various ways to apply backprop? What kind of tasks is SOM good for? SOM basic algorithm What kind of tasks is SOM good for? Simple recurrent networks: how can it encode sequences, how is it different from standard backprop and how similar are they? 21 22 Key Points: slide33 Key Points: slide34 Simple recurrent networks: how can it encode sequences, how is it different from standard backprop and who similar is it? Genetic algorithms basics. What are the issues to be solved in genetic algorithms. Genetic algorithms basics. 23 24

Key Points: slide37 Next Time Decision tree learning: What is the embodied principle (or bias)? How to choose the best attribute? Given a set of examples, choose the best attribute to test first. Tuesday: redefined day general Q and A (attendance not required). Recommended reading in AI, neuroscience, cognitive science, philosophy, etc. What are the issues? noise, overfitting, etc. 25 26