# Supervised learning can be done by choosing the hypothesis that is most probable given the data: = arg max ) = arg max

Save this PDF as:

Size: px
Start display at page:

Download "Supervised learning can be done by choosing the hypothesis that is most probable given the data: = arg max ) = arg max"

## Transcription

1 The learning problem is called realizable if the hypothesis space contains the true function; otherwise it is unrealizable On the other hand, in the name of better generalization ability it may be sensible to trade off exactness of fitting to simplicity of the hypothesis In other words, it may be sensible to be content with a hypothesis fitting the data less perfectly as long as it is simple The hypothesis space needs to be restricted so that finding a hypothesis that fits the data stays computationally efficient Machine learning concentrates on learning relatively simple knowledge representations MAT Artificial Intelligence, Spring Feb Supervised learning can be done by choosing the hypothesis that is most probable given the data: = arg max ) By Bayes rule this is equivalent to = arg max ) ) Then we can say that the prior probability ) is high for a degree-1 or -2 polynomial, lower for degree-7 polynomial, and especially low for degree-7 polynomial with large, sharp spikes There is a tradeoff between the expressiveness of a hypothesis space and the complexity of finding a good hypothesis within that space MAT Artificial Intelligence, Spring Feb

2 18.3 Learning Decision Trees Adecision tree takes as input an object or situation described by a set of attributes It returns a decision the predicted output value for the input If the output values are discrete, then the decision tree classifies the inputs Learning a continuous function is called regression Each internal node in the tree corresponds to a to a test of the value of one of the properties, and the branches from the node are labeled with possible values of the test Each leaf node in the tree specifies the value to be returned if the leaf is reached To process an input, it is directed from the root of the tree through internal nodes to a leaf, which determines the output value MAT Artificial Intelligence, Spring Feb Tyhjä > 60 Alternate? Patrons? Jokunen Täysi Full Wait Estimate? Hungry? 10 0 Reservation? Bar? Fri / Sat? Alternate? Raining? MAT Artificial Intelligence, Spring Feb

3 A decision tree (of reasonable size) is an easy to comprehend way of representing knowledge Important in practice, heuristically learnable The previous decision tree corresponds to the goal predicate whether to wait for a table in a restaurant Its goal predicate can be seen as an assertion of the form : ( ( 1 ( ( )), where each ( ) is a conjunction of tests corresponding to a path from the root of the tree to a leaf with a positive outcome An exponentially large decision tree can express any Boolean function MAT Artificial Intelligence, Spring Feb Typically, decision trees can represent many functions with much smaller trees For some kinds of functions this, however, is a real problem, e.g., xor and maj need exponentially large decision trees Decision trees, like any other knowledge representation, are good for some kinds of functions and bad for others Consider the set of all Boolean functions on attributes How many different functions are in this set? The truth table has 2 rows, so there are 2 2 different functions For example, when =6 2 > , = , and =20> 10 We will need some ingenious algorithms to find consistent hypotheses in such a large space MAT Artificial Intelligence, Spring Feb

4 Top-down induction of decision trees The input to the algorithm is a training set, which consists of examples (, ), where is a vector of input attribute values and is the single output value (class value) attached to them We could simply construct a consistent decision tree that has one path from the root to a leaf for each example Then we would be able to classify all training examples correctly, but the tree would not be able to generalize at all Applying Occam s razor, we should find the smallest decision tree that is consistent with the examples Unfortunately, for any reasonable definition of smallest, finding the smallest tree is an intractable problem MAT Artificial Intelligence, Spring Feb Successful decision tree learning algorithms are based on simple heuristics and do a good job of finding a smallish tree The basic idea is to test the most important attribute first Because the aim is to classify instances, most important attribute is the one that makes the most difference to the classification of an example Actual decision tree construction happens with a recursive algorithm: First the most important attribute is chosen to the root of the tree, the training data is divided according to the values of the chosen attribute, and (sub)tree construction continues using the same idea MAT Artificial Intelligence, Spring Feb

5 GROWCONSTREE(, ) Input: A set of training examples on attributes Output: A decision tree that is consistent with 1. if all examples in have class then 2. return an one-leaf tree labeled by 3. else 4. select an attribute from 5. partition into 1,, by the value of 6. for =1to do 7. = GROWCONSTREE(, ) 8. return a tree that has in its root and 9. as its -th subtree MAT Artificial Intelligence, Spring Feb If there are no examples left no such example has been observed, and we return a default value calculated from the majority classification at the node s parent (or the majority classification at the root) If there are no attributes left but still instances of several classes in the remaining portion of the data, these examples have exactly the same description, but different classification Then we say that there is noise in the data Noise may follow either when the attributes do not give enough information to describe the situation fully, or when the domain is truly nondeterministic One simple way out of this problem is to use a majority vote MAT Artificial Intelligence, Spring Feb

6 Choosing attribute tests The idea is to pick the attribute that goes as far as possible toward providing an exact classification of the examples A perfect attribute divides the examples into sets that contain only instances of one class A really useless attribute leaves the example sets with roughly the same proportion of instances of all classes as the original set To measure the usefulness of attributes we can use, for instance, the expected amount of information provided by the attribute i.e., its Shannon entropy Information theory measures information content in bits One bit of information is enough to answer a yes/no question about which one has no idea, such as the flip of a fair coin MAT Artificial Intelligence, Spring Feb In general, if the possible answers have probabilities ( ), then the entropy of the actual answer is ( ( 1 ),, ( ))= ( ) log 2 ( ) For example, (0.5, 0.5) = 2( 0.5 log 0.5) = 1 bit In choosing attribute tests, we want to calculate the change of the value distribution ( ) of the class attribute, if the training set is divided into subsets according to the value of attribute (P( )) ( ( ) ), where ( ( ) ) = ( ( )), when divides in subsets MAT Artificial Intelligence, Spring Feb

7 Let the training set contain 14and 6 Hence, ( ( ))= (0.7, 0.3) Suppose that attribute divides the data s.t. then 1 = {7,3}, 2 = {7}, 3 = {3} ( ( ) ) = ( ( )) = (10/20) (0.7,0.3)+0+0 ½ MAT Artificial Intelligence, Spring Feb Assessing performance of learning algorithms Divide the set of examples into disjoint training set and test set Apply the training algorithm to the training set, generating a hypothesis Measure the percentage of examples in the test set that are correctly classified by : ( ) =for an (, ) example Repeat the above-mentioned steps for different sizes of training sets and different randomly selected training sets of each size The result of this procedure is a set of data that can be processed to give the average prediction quality as a function of the size of the training set Plotting this function on a graph gives the learning curve An alternative (better) approach to testing is cross-validation MAT Artificial Intelligence, Spring Feb

8 The idea in -fold cross-validation is that each example serves double duty as training data and test data First we split the data into equal subsets We then perform rounds of learning; on each round 1/ of the data is held out as a test set and the remaining examples are used as training data The average test set score of the rounds should then be a better estimate than a single score Popular values for are 5 and 10 enough to give an estimate that is statistically likely to be accurate, at the cost of 5 to 10 times longer computation time The extreme is =, also known as leave-one-out crossvalidation (LOO[CV], or jackknife) MAT Artificial Intelligence, Spring Feb Generalization and overfitting If there are two or more examples with the same description (in terms of attributes) but different classifications no consistent decision tree exists The solution is to have each leaf node report either The majority classification for its set of examples, if a deterministic hypothesis is required, or the estimated probabilities of each classification using the relative frequencies It is quite possible, and in fact likely, that even when vital information is missing, the learning algorithm will find a consistent decision tree This is because the algorithm can use irrelevant attributes, if any, to make spurious distinctions among the examples MAT Artificial Intelligence, Spring Feb

9 Consider trying to predict the roll of a die on the basis of The day and The month in which the die was rolled, and Which is the color of the die, then as long as no two examples have identical descriptions, the learning algorithm will find an exact hypothesis Such a hypothesis will be totally spurious The more attributes there are, the more likely it is that an exact hypothesis will be found The correct tree to return would be a single leaf node with probabilities close to 1/6 for each roll This problem is an example of overfitting, a very general phenomenon afflicting every kind of learning algorithm and target function, not only random concepts MAT Artificial Intelligence, Spring Feb Decision tree pruning A simple approach to deal with overfitting is to prune the decision tree Pruning works by preventing recursive splitting on attributes that are not clearly relevant Suppose we split a set of examples using an irrelevant attribute Generally, we would expect the resulting subsets to have roughly the same proportions of each class as the original set In this case, the information gain will be close to zero How large a gain should we require in order to split on a particular attribute? MAT Artificial Intelligence, Spring Feb

10 A statistical significance test begins by assuming that there is no underlying pattern (the socalled null hypothesis) and then analyzes the actual data to calculate the extent to which they deviate from a perfect absence of pattern If the degree of deviation is statistically unlikely (usually taken to mean a 5% probability or less), then that is considered to be good evidence for the presence of a significant pattern in the data The probabilities are calculated from standard distributions of the amount of deviation one would expect to see in random sampling Null hypothesis: the attribute at hand is irrelevant and, hence, its information gain for an infinitely large sample is zero We need to calculate the probability that, under the null hypothesis, a sample of size = + would exhibit the observed deviation from the expected distribution of examples MAT Artificial Intelligence, Spring Feb Let the numbers positive and negative examples in each subset be and, respectively Their expected values, assuming true irrelevance, are = ( + )/( + ) = ( + )/( + ) where and are the total numbers of positive and negative examples in the training set A convenient measure for the total deviation is given by = ( ) 2 / +( ) 2 / Under the null hypothesis, the value of is distributed according to the 2 (chi-squared) distribution with ( 1) degrees of freedom The probability that the attribute is really irrelevant can be calculated with the help of standard 2 tables MAT Artificial Intelligence, Spring Feb

11 The above method is known as 2 (pre-)pruning Pruning allows the training examples to contain noise and it also reduces the size of the decision trees and makes them more comprehensible More common than pre-pruning are post-pruning methods in which One first constructs a decision tree that is as consistent as possible with the training data and Then removes those subtrees that have likely been added due to the noise In cross-validation the known data is divided in parts, each of which is used as a test set in its turn for a decision tree that has been grown on the other 1subsets Thus one can approximate how well each hypothesis will predict unseen data MAT Artificial Intelligence, Spring Feb Broadening the applicability of decision trees In practice decision tree learning has to answer also the following questions Missing attribute values: while learning and in classifying instances Multivalued discrete attributes: value subsetting or penalizing against too many values Numerical attributes: split point selection for interval division Continuous-valued output attributes Decision trees are used widely and many good implementations are available (for free) Decision trees fulfill understandability, contrary to neural networks, which is a legal requirement for financial decisions MAT Artificial Intelligence, Spring Feb

### 18 LEARNING FROM EXAMPLES

18 LEARNING FROM EXAMPLES An intelligent agent may have to learn, for instance, the following components: A direct mapping from conditions on the current state to actions A means to infer relevant properties

### P(A, B) = P(A B) = P(A) + P(B) - P(A B)

AND Probability P(A, B) = P(A B) = P(A) + P(B) - P(A B) P(A B) = P(A) + P(B) - P(A B) Area = Probability of Event AND Probability P(A, B) = P(A B) = P(A) + P(B) - P(A B) If, and only if, A and B are independent,

### Machine Learning. June 22, 2006 CS 486/686 University of Waterloo

Machine Learning June 22, 2006 CS 486/686 University of Waterloo Outline Inductive learning Decision trees Reading: R&N Ch 18.1-18.3 CS486/686 Lecture Slides (c) 2006 K.Larson and P. Poupart 2 What is

### LEARNING FROM OBSERVATIONS

1 LEARNING FROM OBSERVATIONS In which we describe agents that can improve their behavior through diligent study of their own experiences. The idea behind learning is that percepts should be used not only

### Decision Tree for Playing Tennis

Decision Tree Decision Tree for Playing Tennis (outlook=sunny, wind=strong, humidity=normal,? ) DT for prediction C-section risks Characteristics of Decision Trees Decision trees have many appealing properties

### Outline. Learning from Observations. Learning agents. Learning. Inductive learning (a.k.a. Science) Environment. Agent.

Outline Learning agents Learning from Observations Inductive learning Decision tree learning Measuring learning performance Chapter 18, Sections 1 3 Chapter 18, Sections 1 3 1 Chapter 18, Sections 1 3

### Inductive Learning and Decision Trees

Inductive Learning and Decision Trees Doug Downey EECS 349 Spring 2017 with slides from Pedro Domingos, Bryan Pardo Outline Announcements Homework #1 was assigned on Monday (due in five days!) Inductive

### Machine Learning. Announcements (7/15) Announcements (7/16) Comments on the Midterm. Agents that Learn. Agents that Don t Learn

Machine Learning Burr H. Settles CS540, UWMadison www.cs.wisc.edu/~cs5401 Summer 2003 Announcements (7/15) If you haven t already, read Sections 18.118.3 in AI: A Modern Approach Homework #3 due tomorrow

### Inductive Learning and Decision Trees

Inductive Learning and Decision Trees Doug Downey EECS 349 Winter 2014 with slides from Pedro Domingos, Bryan Pardo Outline Announcements Homework #1 assigned Have you completed it? Inductive learning

### Inducing a Decision Tree

Inducing a Decision Tree In order to learn a decision tree, our agent will need to have some information to learn from: a training set of examples each example is described by its values for the problem

### Machine Learning B, Fall 2016

Machine Learning 10-601 B, Fall 2016 Decision Trees (Summary) Lecture 2, 08/31/ 2016 Maria-Florina (Nina) Balcan Learning Decision Trees. Supervised Classification. Useful Readings: Mitchell, Chapter 3

### Compacting Instances: Creating models

Decision Trees Compacting Instances: Creating models Food Chat Speedy Price Bar BigTip (3) (2) (2) (2) (2) 1 great yes yes adequate no yes 2 great no yes adequate no yes 3 mediocre yes no high no no 4

### Assignment 6 (Sol.) Introduction to Machine Learning Prof. B. Ravindran

Assignment 6 (Sol.) Introduction to Machine Learning Prof. B. Ravindran 1. Assume that you are given a data set and a neural network model trained on the data set. You are asked to build a decision tree

### Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. January 11, 2011

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 11, 2011 Today: What is machine learning? Decision tree learning Course logistics Readings: The Discipline

### CSC 4510/9010: Applied Machine Learning Rule Inference

CSC 4510/9010: Applied Machine Learning Rule Inference Dr. Paula Matuszek Paula.Matuszek@villanova.edu Paula.Matuszek@gmail.com (610) 647-9789 CSC 4510.9010 Spring 2015. Paula Matuszek 1 Red Tape Going

### Introduction to Machine Learning

Introduction to Machine Learning D. De Cao R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 April 6, 2009 Outline Outline Introduction to Machine Learning Outline Outline Introduction to Machine Learning

### Machine Learning (Decision Trees and Intro to Neural Nets) CSCI 3202, Fall 2010

Machine Learning (Decision Trees and Intro to Neural Nets) CSCI 3202, Fall 2010 Assignments To read this week: Chapter 18, sections 1-4 and 7 Problem Set 3 due next week! Learning a Decision Tree We look

### Decision Tree For Playing Tennis

Decision Tree For Playing Tennis ROOT NODE BRANCH INTERNAL NODE LEAF NODE Disjunction of conjunctions Another Perspective of a Decision Tree Model Age 60 40 20 NoDefault NoDefault + + NoDefault Default

### Section 18.3 Learning Decision Trees

Section 18.3 Learning Decision Trees CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline Attribute-based representations Decision tree

### Handbook of Perception and Cognition, Vol.14 Chapter 4: Machine Learning

Handbook of Perception and Cognition, Vol.14 Chapter 4: Machine Learning Stuart Russell Computer Science Division University of California Berkeley, CA 94720, USA (510) 642 4964, fax: (510) 642 5775 Contents

### DEVELOPMENT AND APPLICATIONS OF DECISION TREES

3 DEVELOPMENT AND APPLICATIONS OF DECISION TREES HUSSEIN ALMUALLIM Information and Computer Science Department, King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia SHIGEO KANEDA Graduate

### CS534 Machine Learning

CS534 Machine Learning Spring 2013 Lecture 1: Introduction to ML Course logistics Reading: The discipline of Machine learning by Tom Mitchell Course Information Instructor: Dr. Xiaoli Fern Kec 3073, xfern@eecs.oregonstate.edu

### CS 4510/9010 Applied Machine Learning. Evaluation. Paula Matuszek Fall, copyright Paula Matuszek 2016

CS 4510/9010 Applied Machine Learning 1 Evaluation Paula Matuszek Fall, 2016 Evaluating Classifiers 2 With a decision tree, or with any classifier, we need to know how well our trained model performs on

### 10701/15781 Machine Learning, Spring 2005: Homework 1

10701/15781 Machine Learning, Spring 2005: Homework 1 Due: Monday, February 6, beginning of the class 1 [15 Points] Probability and Regression [Stano] 1 1.1 [10 Points] The Matrix Strikes Back The Matrix

### ECT7110 Classification Decision Trees. Prof. Wai Lam

ECT7110 Classification Decision Trees Prof. Wai Lam Classification and Decision Tree What is classification? What is prediction? Issues regarding classification and prediction Classification by decision

### Ensemble Learning CS534

Ensemble Learning CS534 Ensemble Learning How to generate ensembles? There have been a wide range of methods developed We will study to popular approaches Bagging Boosting Both methods take a single (base)

### Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

### Binary decision trees

Binary decision trees A binary decision tree ultimately boils down to taking a majority vote within each cell of a partition of the feature space (learned from the data) that looks something like this

### A Review on Classification Techniques in Machine Learning

A Review on Classification Techniques in Machine Learning R. Vijaya Kumar Reddy 1, Dr. U. Ravi Babu 2 1 Research Scholar, Dept. of. CSE, Acharya Nagarjuna University, Guntur, (India) 2 Principal, DRK College

### Conditional Independence Trees

Conditional Independence Trees Harry Zhang and Jiang Su Faculty of Computer Science, University of New Brunswick P.O. Box 4400, Fredericton, NB, Canada E3B 5A3 hzhang@unb.ca, WWW home page: http://www.cs.unb.ca/profs/hzhang/

### CS 354R: Computer Game Technology

CS 354R: Computer Game Technology AI Decision Trees and Rule Systems Fall 2017 Decision Trees Nodes represent attribute tests One child for each outcome Leaves represent classifications Can have same classification

### Foundations of Intelligent Systems CSCI (Fall 2015)

Foundations of Intelligent Systems CSCI-630-01 (Fall 2015) Final Examination, Fri. Dec 18, 2015 Instructor: Richard Zanibbi, Duration: 120 Minutes Name: Instructions The exam questions are worth a total

### Machine Learning 2nd Edition

INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

### A Few Useful Things to Know about Machine Learning. Pedro Domingos Department of Computer Science and Engineering University of Washington" 2012"

A Few Useful Things to Know about Machine Learning Pedro Domingos Department of Computer Science and Engineering University of Washington 2012 A Few Useful Things to Know about Machine Learning Machine

### Applied Machine Learning Lecture 1: Introduction

Applied Machine Learning Lecture 1: Introduction Richard Johansson January 16, 2018 welcome to the course! machine learning is getting increasingly popular among students our courses are full! many thesis

### Dudon Wai Georgia Institute of Technology CS 7641: Machine Learning Atlanta, GA

Adult Income and Letter Recognition - Supervised Learning Report An objective look at classifier performance for predicting adult income and Letter Recognition Dudon Wai Georgia Institute of Technology

### CS221 Practice Midterm #1

CS221 Practice Midterm #1 Summer 2013 The following pages are excerpts from similar classes midterms. The content is similar to our midterm but I have opted to give you a document with more problems rather

### Deriving Decision Trees from Case Data

Topic 4 Automatic Kwledge Acquisition PART II Contents 5.1 The Bottleneck of Kwledge Aquisition 5.2 Inductive Learning: Decision Trees 5.3 Converting Decision Trees into Rules 5.4 Generating Decision Trees:

### COMP 551 Applied Machine Learning Lecture 11: Ensemble learning

COMP 551 Applied Machine Learning Lecture 11: Ensemble learning Instructor: Herke van Hoof (herke.vanhoof@mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp551

### Rule Learning (1): Classification Rules

14s1: COMP9417 Machine Learning and Data Mining Rule Learning (1): Classification Rules March 19, 2014 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill,

### Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and

### Jeff Howbert Introduction to Machine Learning Winter

Classification Ensemble e Methods 1 Jeff Howbert Introduction to Machine Learning Winter 2012 1 Ensemble methods Basic idea of ensemble methods: Combining predictions from competing models often gives

### Introduction to Machine Learning

1, 582631 5 credits Introduction to Machine Learning Lecturer: Teemu Roos Assistant: Ville Hyvönen Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer and Jyrki

### AP Statistics Audit Syllabus

AP Statistics Audit Syllabus COURSE DESCRIPTION: AP Statistics is the high school equivalent of a one semester, introductory college statistics course. In this course, students develop strategies for collecting,

### Session 1: Gesture Recognition & Machine Learning Fundamentals

IAP Gesture Recognition Workshop Session 1: Gesture Recognition & Machine Learning Fundamentals Nicholas Gillian Responsive Environments, MIT Media Lab Tuesday 8th January, 2013 My Research My Research

### Decision Boundary. Hemant Ishwaran and J. Sunil Rao

32 Decision Trees, Advanced Techniques in Constructing define impurity using the log-rank test. As in CART, growing a tree by reducing impurity ensures that terminal nodes are populated by individuals

### Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. January 12, 2015

Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 12, 2015 Today: What is machine learning? Decision tree learning Course logistics Readings: The Discipline

### Lecture 1: Machine Learning Basics

1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

### Predicting Student Performance by Using Data Mining Methods for Classification

BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance

### ANALYZING BIG DATA WITH DECISION TREES

San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 ANALYZING BIG DATA WITH DECISION TREES Lok Kei Leong Follow this and additional works at:

### CS Machine Learning

CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

### INTRODUCTION TO DATA SCIENCE

DATA11001 INTRODUCTION TO DATA SCIENCE EPISODE 6: MACHINE LEARNING TODAY S MENU 1. WHAT IS ML? 2. CLASSIFICATION AND REGRESSSION 3. EVALUATING PERFORMANCE & OVERFITTING WHAT IS MACHINE LEARNING? Definition:

### COMP 551 Applied Machine Learning Lecture 12: Ensemble learning

COMP 551 Applied Machine Learning Lecture 12: Ensemble learning Associate Instructor: Herke van Hoof (herke.vanhoof@mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

### CSC 4510/9010: Applied Machine Learning. Rule Inference. Dr. Paula Matuszek

CSC 4510/9010: Applied Machine Learning 1 Rule Inference Dr. Paula Matuszek Paula.Matuszek@villanova.edu Paula.Matuszek@gmail.com (610) 647-9789 Classification rules Popular alternative to decision trees

### Course 395: Machine Learning Lectures

Course 395: Machine Learning Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic) Lecture 5-6: Artificial Neural Networks (S. Zafeiriou) Lecture 7-8: Instance

### CLASSIFICATION: DECISION TREES

CLASSIFICATION: DECISION TREES Gökhan Akçapınar (gokhana@hacettepe.edu.tr) Seminar in Methodology and Statistics John Nerbonne, Çağrı Çöltekin University of Groningen May, 2012 Outline Research question

### Seeing the Forest through the Trees

Seeing the Forest through the Trees Learning a Comprehensible Model from a First Order Ensemble Anneleen Van Assche and Hendrik Blockeel Computer Science Department, Katholieke Universiteit Leuven, Belgium

### Predicting Academic Success from Student Enrolment Data using Decision Tree Technique

Predicting Academic Success from Student Enrolment Data using Decision Tree Technique M Narayana Swamy Department of Computer Applications, Presidency College Bangalore,India M. Hanumanthappa Department

### Distinguish Wild Mushrooms with Decision Tree. Shiqin Yan

Distinguish Wild Mushrooms with Decision Tree Shiqin Yan Introduction Mushroom poisoning, which also known as mycetism, refers to harmful effects from ingestion of toxic substances present in the mushroom.

### PRESENTATION TITLE. A Two-Step Data Mining Approach for Graduation Outcomes CAIR Conference

PRESENTATION TITLE A Two-Step Data Mining Approach for Graduation Outcomes 2013 CAIR Conference Afshin Karimi (akarimi@fullerton.edu) Ed Sullivan (esullivan@fullerton.edu) James Hershey (jrhershey@fullerton.edu)

### Contrasts and Post Hoc Tests for One-Way Independent ANOVA Using SPSS

Contrasts and Post Hoc Tests for One-Way Independent ANOVA Using SPSS Some Data with which to play There is a lot of controversy at the moment surrounding the drug Viagra, which is a sexual stimulant (used

### Introduction to Classification

Introduction to Classification Classification: Definition Given a collection of examples (training set ) Each example is represented by a set of features, sometimes called attributes Each example is to

### CS 540: Introduction to Artificial Intelligence

CS 540: Introduction to Artificial Intelligence Midterm Exam: 4:00-5:15 pm, October 25, 2016 B130 Van Vleck CLOSED BOOK (one sheet of notes and a calculator allowed) Write your answers on these pages and

### A Survey on Hoeffding Tree Stream Data Classification Algorithms

CPUH-Research Journal: 2015, 1(2), 28-32 ISSN (Online): 2455-6076 http://www.cpuh.in/academics/academic_journals.php A Survey on Hoeffding Tree Stream Data Classification Algorithms Arvind Kumar 1*, Parminder

### 7/29/2015. Results from the 2015 AP Statistics Exam. The six free-response questions. Plan for each question

The six free-response questions Results from the 2015 AP Statistics Exam Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu Question #1: Accountant salaries five years

### Classifying Breast Cancer By Using Decision Tree Algorithms

Classifying Breast Cancer By Using Decision Tree Algorithms Nusaibah AL-SALIHY, Turgay IBRIKCI (Presenter) Cukurova University, TURKEY What Is A Decision Tree? Why A Decision Tree? Why Decision TreeClassification?

### Learning Concept Classification Rules Using Genetic Algorithms

Learning Concept Classification Rules Using Genetic Algorithms Kenneth A. De Jong George Mason University Fairfax, VA 22030 USA kdejong@aic.gmu.edu William M. Spears Naval Research Laboratory Washington,

### Math Statistics Project points

Math 113 - Statistics Project - 100 points Your task is to perform some real-world inferential statistics. You will take a claim that someone has made, form a hypothesis from that, collect the data necessary

### LEARNING AGENTS IN ARTIFICIAL INTELLIGENCE PART I

Journal of Advanced Research in Computer Engineering, Vol. 5, No. 1, January-June 2011, pp. 1-5 Global Research Publications ISSN:0974-4320 LEARNING AGENTS IN ARTIFICIAL INTELLIGENCE PART I JOSEPH FETTERHOFF

### Introduction to Classification, aka Machine Learning

Introduction to Classification, aka Machine Learning Classification: Definition Given a collection of examples (training set ) Each example is represented by a set of features, sometimes called attributes

### Using Decision Trees to Understand Student Data

Elizabeth Murray Basement of Carson, Room 36 wumpus@ou.edu Abstract We apply and evaluate a decision tree algorithm to university records, producing human-readable graphs that are useful both for predicting

### Selective Bayesian Classifier: Feature Selection for the Naïve Bayesian Classifier Using Decision Trees

Selective Bayesian Classifier: Feature Selection for the Naïve Bayesian Classifier Using Decision Trees Chotirat Ann Ratanamahatana, Dimitrios Gunopulos Department of Computer Science, University of California,

### Chapter 1-2: Methods for Describing Sets of Data. Introductory Concepts:

Introductory Concepts: Statistics is the science of data. It involves collecting, classifying, summarizing, organizing, analyzing, and interpreting numerical information. Descriptive Stat: Involves collecting,

### Detecting the Learning Value of Items In a Randomized Problem Set

Detecting the Learning Value of Items In a Randomized Problem Set Zachary A. Pardos 1, Neil T. Heffernan Worcester Polytechnic Institute {zpardos@wpi.edu, nth@wpi.edu} Abstract. Researchers that make tutoring

### Machine Learning. Basic Concepts. Joakim Nivre. Machine Learning 1(24)

Machine Learning Basic Concepts Joakim Nivre Uppsala University and Växjö University, Sweden E-mail: nivre@msi.vxu.se Machine Learning 1(24) Machine Learning Idea: Synthesize computer programs by learning

### Ensemble Learning CS534

Ensemble Learning CS534 Ensemble Learning How to generate ensembles? There have been a wide range of methods developed We will study some popular approaches Bagging ( and Random Forest, a variant that

### Fall 2011 Exam Score: /76. Exam 2

Math 12 Fall 2011 Name Exam Score: /76 Total Class Percent to Date Exam 2 For problems 1-8, circle the letter next to the response that BEST answers the question or completes the sentence. You do not have

### Course 395: Machine Learning Lectures

Course 395: Machine Learning Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic) Lecture 5-6: Artificial Neural Networks (THs) Lecture 7-8: Instance Based

### Statistics 2000, Section 001, Final (300 Points) Part I: Text Answers. Your Name:

Statistics 2000, Section 001, Final (300 Points) Wednesday, May 4, 2011 Part I: Text Answers Your Name: Question 1: Statistical Inference (68 Points) Eight people volunteered to be part of an experiment.

### Ensemble Learning. Synonyms. Definition. Main Body Text. Zhi-Hua Zhou. Committee-based learning; Multiple classifier systems; Classifier combination

Ensemble Learning Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China zhouzh@nju.edu.cn Synonyms Committee-based learning; Multiple classifier

### TOWARDS DATA-DRIVEN AUTONOMICS IN DATA CENTERS

TOWARDS DATA-DRIVEN AUTONOMICS IN DATA CENTERS ALINA SIRBU, OZALP BABAOGLU SUMMARIZED BY ARDA GUMUSALAN MOTIVATION 2 MOTIVATION Human-interaction-dependent data centers are not sustainable for future data

### A Quantitative Study of Small Disjuncts in Classifier Learning

Submitted 1/7/02 A Quantitative Study of Small Disjuncts in Classifier Learning Gary M. Weiss AT&T Labs 30 Knightsbridge Road, Room 31-E53 Piscataway, NJ 08854 USA Keywords: classifier learning, small

### Bias and the Probability of Generalization

Brigham Young University BYU ScholarsArchive All Faculty Publications 1997-12-10 Bias and the Probability of Generalization Tony R. Martinez martinez@cs.byu.edu D. Randall Wilson Follow this and additional

### Performance Analysis of Various Data Mining Techniques on Banknote Authentication

International Journal of Engineering Science Invention ISSN (Online): 2319 6734, ISSN (Print): 2319 6726 Volume 5 Issue 2 February 2016 PP.62-71 Performance Analysis of Various Data Mining Techniques on

### IAI : Machine Learning

IAI : Machine Learning John A. Bullinaria, 2005 1. What is Machine Learning? 2. The Need for Learning 3. Learning in Neural and Evolutionary Systems 4. Problems Facing Expert Systems 5. Learning in Rule

### I400 Health Informatics Data Mining Instructions (KP Project)

I400 Health Informatics Data Mining Instructions (KP Project) Casey Bennett Spring 2014 Indiana University 1) Import: First, we need to import the data into Knime. add CSV Reader Node (under IO>>Read)

### CS545 Machine Learning

Machine learning and related fields CS545 Machine Learning Course Introduction Machine learning: the construction and study of systems that learn from data. Pattern recognition: the same field, different

### Machine Learning :: Introduction. Konstantin Tretyakov

Machine Learning :: Introduction Konstantin Tretyakov (kt@ut.ee) MTAT.03.183 Data Mining November 5, 2009 So far Data mining as knowledge discovery Frequent itemsets Descriptive analysis Clustering Seriation

### Concession Curve Analysis for Inspire Negotiations

Concession Curve Analysis for Inspire Negotiations Vivi Nastase SITE University of Ottawa, Ottawa, ON vnastase@site.uottawa.ca Gregory Kersten John Molson School of Business Concordia University, Montreal,

Gradual Forgetting for Adaptation to Concept Drift Ivan Koychev GMD FIT.MMK D-53754 Sankt Augustin, Germany phone: +49 2241 14 2194, fax: +49 2241 14 2146 Ivan.Koychev@gmd.de Abstract The paper presents

### 1 Subject. 2 Dataset. 3 Descriptive statistics. 3.1 Data importation. SIPINA proposes some descriptive statistics functionalities.

1 Subject proposes some descriptive statistics functionalities. In itself, the information is not really exceptional; there is a large number of freeware which do that. It becomes more interesting when

### A Classification Method using Decision Tree for Uncertain Data

A Classification Method using Decision Tree for Uncertain Data Annie Mary Bhavitha S 1, Sudha Madhuri 2 1 Pursuing M.Tech(CSE), Nalanda Institute of Engineering & Technology, Siddharth Nagar, Sattenapalli,

### Optimizing Conversations in Chatous s Random Chat Network

Optimizing Conversations in Chatous s Random Chat Network Alex Eckert (aeckert) Kasey Le (kaseyle) Group 57 December 11, 2013 Introduction Social networks have introduced a completely new medium for communication

### Analytical Study of Some Selected Classification Algorithms in WEKA Using Real Crime Data

Analytical Study of Some Selected Classification Algorithms in WEKA Using Real Crime Data Obuandike Georgina N. Department of Mathematical Sciences and IT Federal University Dutsinma Katsina state, Nigeria

### Online Ensemble Learning: An Empirical Study

Online Ensemble Learning: An Empirical Study Alan Fern AFERN@ECN.PURDUE.EDU Robert Givan GIVAN@ECN.PURDUE.EDU Department of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 4797

### Backward Sequential Feature Elimination And Joining Algorithms In Machine Learning

San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Backward Sequential Feature Elimination And Joining Algorithms In Machine Learning Sanya

### AP Statistics Practice Test Unit Five Randomness and Probability. Name Period Date

AP Statistics Practice Test Unit Five Randomness and Probability Name Period Date Vocabulary: Define each word and give an example 1. Disjoint 2. Complements 3. Intersection Short Answer: 4. Explain the

### Stay Alert!: Creating a Classifier to Predict Driver Alertness in Real-time

Stay Alert!: Creating a Classifier to Predict Driver Alertness in Real-time Aditya Sarkar, Julien Kawawa-Beaudan, Quentin Perrot Friday, December 11, 2014 1 Problem Definition Driving while drowsy inevitably