Decision trees. Subhransu Maji. CMPSCI 689: Machine Learning. 22 January 2015

Similar documents
(Sub)Gradient Descent

Lecture 1: Machine Learning Basics

CS Machine Learning

Python Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Assignment 1: Predicting Amazon Review Ratings

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

CSL465/603 - Machine Learning

Rule Learning With Negation: Issues Regarding Effectiveness

Learning From the Past with Experiment Databases

Rule Learning with Negation: Issues Regarding Effectiveness

Laboratorio di Intelligenza Artificiale e Robotica

Chapter 2 Rule Learning in a Nutshell

Laboratorio di Intelligenza Artificiale e Robotica

A Case Study: News Classification Based on Term Frequency

Lecture 1: Basic Concepts of Machine Learning

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

An investigation of imitation learning algorithms for structured prediction

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

The stages of event extraction

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Model Ensemble for Click Prediction in Bing Search Ads

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

STA 225: Introductory Statistics (CT)

B. How to write a research paper

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Learning goal-oriented strategies in problem solving

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Machine Learning and Development Policy

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Multi-label Classification via Multi-target Regression on Data Streams

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

CS 446: Machine Learning

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Evolutive Neural Net Fuzzy Filtering: Basic Description

Generative models and adversarial training

Number Line Moves Dash -- 1st Grade. Michelle Eckstein

Axiom 2013 Team Description Paper

Grade 6: Correlated to AGS Basic Math Skills

Self Study Report Computer Science

Data Structures and Algorithms

Lecture 10: Reinforcement Learning

Navigating the PhD Options in CMS

Hentai High School A Game Guide

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Mathematics process categories

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Getting Started with Deliberate Practice

A Version Space Approach to Learning Context-free Grammars

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Softprop: Softmax Neural Network Backpropagation Learning

Using focal point learning to improve human machine tacit coordination

COSI Meet the Majors Fall 17. Prof. Mitch Cherniack Undergraduate Advising Head (UAH), COSI Fall '17: Instructor COSI 29a

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

CS177 Python Programming

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

MYCIN. The MYCIN Task

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Grammars & Parsing, Part 1:

Software Maintenance

Multi-label classification via multi-target regression on data streams

Australian Journal of Basic and Applied Sciences

Copyright by Sung Ju Hwang 2013

Discriminative Learning of Beam-Search Heuristics for Planning

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Common Core State Standards

The Strong Minimalist Thesis and Bounded Optimality

Mathematics Scoring Guide for Sample Test 2005

Top US Tech Talent for the Top China Tech Company

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Outline for Session III

Mining Student Evolution Using Associative Classification and Clustering

Learning Methods in Multilingual Speech Recognition

Switchboard Language Model Improvement with Conversational Data from Gigaword

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

Interactive Whiteboard

Human Emotion Recognition From Speech

Physics Experimental Physics II: Electricity and Magnetism Prof. Eno Spring 2017

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

B.S/M.A in Mathematics

Truth Inference in Crowdsourcing: Is the Problem Solved?

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Computers Change the World

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Major Milestones, Team Activities, and Individual Deliverables

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Transcription:

Decision trees Subhransu Maji CMPSCI 689: Machine Learning 22 January 2015

Overview What does it mean to learn?! Machine learning framework! Decision tree model! a greedy learning algorithm Formalizing the learning problem! Inductive bias! Underfitting and overfitting! Model, parameters, and hyperparameters 2/27

What does it mean to learn? 3/27

What does it mean to learn? Alice has just begun taking a machine learning course Bob, the instructor has to ascertain if Alice has learned the topics covered, at the end of the course A common way of doing this to give her an exam What is a reasonable exam? 3/27

What does it mean to learn? Alice has just begun taking a machine learning course Bob, the instructor has to ascertain if Alice has learned the topics covered, at the end of the course A common way of doing this to give her an exam What is a reasonable exam? Choice 1: History of pottery Alice s performance is not indicative of what she learned in ML 3/27

What does it mean to learn? Alice has just begun taking a machine learning course Bob, the instructor has to ascertain if Alice has learned the topics covered, at the end of the course A common way of doing this to give her an exam What is a reasonable exam? Choice 1: History of pottery Alice s performance is not indicative of what she learned in ML Choice 2: Questions answered during lectures Bad choice, especially if it is an open book 3/27

What does it mean to learn? Alice has just begun taking a machine learning course Bob, the instructor has to ascertain if Alice has learned the topics covered, at the end of the course A common way of doing this to give her an exam What is a reasonable exam? Choice 1: History of pottery Alice s performance is not indicative of what she learned in ML Choice 2: Questions answered during lectures Bad choice, especially if it is an open book A good test should test her ability to answer related but new questions on the exam This tests weather Alice has an ability to generalize Generalization is a one of the central concepts in ML 3/27

What does it mean to learn? 4/27

What does it mean to learn? Student ratings of undergrad CS courses Collection of students and courses The evaluation is a score -2 (terrible), +2 (awesome) The job is to say if a particular student (say, Alice) will like a particular course (say, Algorithms) We are given historical data, i.e., course ratings in the past, we are trying to predict unseen ratings (i.e., the future) 4/27

What does it mean to learn? Student ratings of undergrad CS courses Collection of students and courses The evaluation is a score -2 (terrible), +2 (awesome) The job is to say if a particular student (say, Alice) will like a particular course (say, Algorithms) We are given historical data, i.e., course ratings in the past, we are trying to predict unseen ratings (i.e., the future) We can ask if: Will Alice will like History of pottery? Unfair, because the system doesn t even know what that is 4/27

What does it mean to learn? Student ratings of undergrad CS courses Collection of students and courses The evaluation is a score -2 (terrible), +2 (awesome) The job is to say if a particular student (say, Alice) will like a particular course (say, Algorithms) We are given historical data, i.e., course ratings in the past, we are trying to predict unseen ratings (i.e., the future) We can ask if: Will Alice will like History of pottery? too much generalization Unfair, because the system doesn t even know what that is 4/27

What does it mean to learn? Student ratings of undergrad CS courses Collection of students and courses The evaluation is a score -2 (terrible), +2 (awesome) The job is to say if a particular student (say, Alice) will like a particular course (say, Algorithms) We are given historical data, i.e., course ratings in the past, we are trying to predict unseen ratings (i.e., the future) We can ask if: Will Alice will like History of pottery? too much generalization Unfair, because the system doesn t even know what that is Will Alice like AI? Easy if Alice took AI last year and said it was +2 (awesome) 4/27

What does it mean to learn? Student ratings of undergrad CS courses Collection of students and courses The evaluation is a score -2 (terrible), +2 (awesome) The job is to say if a particular student (say, Alice) will like a particular course (say, Algorithms) We are given historical data, i.e., course ratings in the past, we are trying to predict unseen ratings (i.e., the future) We can ask if: Will Alice will like History of pottery? too much generalization Unfair, because the system doesn t even know what that is Will Alice like AI? Too little generalization Easy if Alice took AI last year and said it was +2 (awesome) 4/27

Machine learning framework Training data:! Alice in ML course: concepts that she encounters in the class Recommender systems: past course ratings! Learning algorithm induces a function f that maps examples to labels!! The set of new examples is called the test set! Closely guarded secret: it is the final exam where the learner is going to be tested A ML algorithm has succeeded if its performance on the test data is good! We will focus on a simple model of learning called a decision tree known labels Training data f Test data labels? labels 5/27

The decision tree model of learning 6/27

The decision tree model of learning Classic and natural model of learning 6/27

The decision tree model of learning Classic and natural model of learning Question: Will an unknown user enjoy an unknown course?! You: Is the course under consideration in Systems? Me: Yes You: Has this student taken any other Systems courses? Me: Yes You: Has this student liked most previous Systems courses? Me: No You: I predict this student will not like this course. 6/27

The decision tree model of learning Classic and natural model of learning Question: Will an unknown user enjoy an unknown course?! You: Is the course under consideration in Systems? Me: Yes You: Has this student taken any other Systems courses? Me: Yes You: Has this student liked most previous Systems courses? Me: No You: I predict this student will not like this course. Goal of learner: Figure out what questions to ask, and in what order, and what to predict when you have answered enough questions 6/27

Learning a decision tree Recall that one of the ingredients of learning is training data! I ll give you (x, y) pairs, i.e., set of (attributes, label) pairs We will simplify the problem by {0,+1, +2} as liked {-1,-2} as hated Here:! Questions are features Responses are feature values Rating is the label! Lots of possible trees to build! Can we find good one quickly? Course ratings dataset 7/27

Greedy decision tree learning If I could ask one question, what question would I ask?! You want a feature that is most useful in predicting the rating of the course A useful way of thinking about this is to look at the histogram of the labels for each feature 8/27

Greedy decision tree learning If I could ask one question, what question would I ask?! You want a feature that is most useful in predicting the rating of the course A useful way of thinking about this is to look at the histogram of the labels for each feature 8/27

What attribute is useful? Attribute = Easy? 9/27

What attribute is useful? Attribute = Easy? # correct = 6 10/27

What attribute is useful? Attribute = Easy? # correct = 6 11/27

What attribute is useful? Attribute = Easy? # correct = 12 12/27

What attribute is useful? Attribute = Sys? 13/27

What attribute is useful? Attribute = Sys? # correct = 10 14/27

What attribute is useful? Attribute = Sys? # correct = 8 15/27

What attribute is useful? Attribute = Sys? # correct = 18 16/27

Picking the best attribute =12 =12 =15 =18 =14 =13 best attribute 17/27

Decision tree train 18/27

Decision tree test 19/27

Formalizing the learning problem Loss function:! The way we measure performance of the classifier! Examples: Regression: squared loss: or, absolute loss: Binary classification: zero-one loss!!! `(y, ŷ) `(y, ŷ) =(y ŷ) 2 `(y, ŷ) = y ŷ Multiclass classification: also, zero-one loss 20/27

Formalizing the learning problem Loss function:! `(y, ŷ) Data generating distribution:! D(x,y) : probability distribution from which the data comes from Assigns high probability to reasonable Assigns low probability to unreasonable Examples: Reasonable Unreasonable Unreasonable x (x,y) (x,y) : Intro to Python : Intro to Quantum Pottery : (AI,unlike) x (x,y) D(x,y) pairs pairs 21/27

Formalizing the learning problem Loss function:! `(y, ŷ) Data generating distribution:! D(x,y) : probability distribution from which the data comes from Assigns high probability to reasonable Assigns low probability to unreasonable Examples: Reasonable Unreasonable Unreasonable x We don t know what (x,y) (x,y) : Intro to Python : Intro to Quantum Pottery : (AI,unlike) x (x,y) D is!! D(x,y) All we have is access to training samples drawn from pairs pairs D 21/27

Formalizing the learning problem Loss function:! `(y, ŷ) Training samples: unknown distribution D drawn from an Learning problem: Compute a function f that minimizes the expected loss over the distributiond(x,y) Training error 22/27

Inductive bias 23/27

Inductive bias What do we know before we see the data? 23/27

Inductive bias What do we know before we see the data? A B C D Partition these into two groups 23/27

Inductive bias What do we know before we see the data? A B C D Partition these into two groups What is the inductive bias of the decision tree algorithm? 23/27

Underfitting and overfitting 24/27

Underfitting and overfitting! Decision trees:! Underfitting: an empty decision tree Test error:? Overfitting: a full decision tree Test error:? 24/27

Model, parameters and hyperparameters Model: decision tree! Parameters: learned by the algorithm! Hyperparameter: depth of the tree to consider! A typical way of setting this is to use validation data Usually set 2/3 training and 1/3 testing Split the training into 1/2 training and 1/2 validation Estimate optimal hyperparameters on the validation data training validation testing 25/27

Summary Generalization is key! Inductive bias is needed to generalize beyond training examples! Decision tree model! a greedy learning algorithm Inductive bias of the learner Underfitting and overfitting Model, parameters, and hyperparameters 26/27

Slides credit Many slides are adapted from the book Course in Machine Learning by Hal Daume 27/27