Inductive Learning and Decision Trees

Similar documents
Lecture 1: Basic Concepts of Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

CSL465/603 - Machine Learning

(Sub)Gradient Descent

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Lecture 1: Machine Learning Basics

Active Learning. Yingyu Liang Computer Sciences 760 Fall

CS 446: Machine Learning

CS Machine Learning

Chapter 2 Rule Learning in a Nutshell

Grade 6: Correlated to AGS Basic Math Skills

Constructive Induction-based Learning Agents: An Architecture and Preliminary Experiments

LEGO MINDSTORMS Education EV3 Coding Activities

GACE Computer Science Assessment Test at a Glance

GLOBAL INSTITUTIONAL PROFILES PROJECT Times Higher Education World University Rankings

Rule Learning With Negation: Issues Regarding Effectiveness

Course Content Concepts

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Rule Learning with Negation: Issues Regarding Effectiveness

Data Stream Processing and Analytics

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Python Machine Learning

Unit: Human Impact Differentiated (Tiered) Task How Does Human Activity Impact Soil Erosion?

Linking Task: Identifying authors and book titles in verbose queries

Spring 2016 Stony Brook University Instructor: Dr. Paul Fodor

Interactive Whiteboard

Economics 201 Principles of Microeconomics Fall 2010 MWF 10:00 10:50am 160 Bryan Building

School of Innovative Technologies and Engineering

EECS 700: Computer Modeling, Simulation, and Visualization Fall 2014

Name Class Date. Graphing Proportional Relationships

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Calibration of Confidence Measures in Speech Recognition

Unit 1: Scientific Investigation-Asking Questions

Knowledge Transfer in Deep Convolutional Neural Nets

Assignment 1: Predicting Amazon Review Ratings

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Rule-based Expert Systems

The One Minute Preceptor: 5 Microskills for One-On-One Teaching

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Pre-AP Geometry Course Syllabus Page 1

STA2023 Introduction to Statistics (Hybrid) Spring 2013

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Intelligent Agents. Chapter 2. Chapter 2 1

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Mathematics Assessment Plan

The Tutor Shop Homework Club Family Handbook. The Tutor Shop Mission, Vision, Payment and Program Policies Agreement

1.11 I Know What Do You Know?

The Strong Minimalist Thesis and Bounded Optimality

A Case Study: News Classification Based on Term Frequency

Medical Complexity: A Pragmatic Theory

Corrective Feedback and Persistent Learning for Information Extraction

Statewide Framework Document for:

Foothill College Fall 2014 Math My Way Math 230/235 MTWThF 10:00-11:50 (click on Math My Way tab) Math My Way Instructors:

Common Core State Standards

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

On-Line Data Analytics


SOCIAL STUDIES GRADE 1. Clear Learning Targets Office of Teaching and Learning Curriculum Division FAMILIES NOW AND LONG AGO, NEAR AND FAR

MATH 108 Intermediate Algebra (online) 4 Credits Fall 2008

Evidence for Reliability, Validity and Learning Effectiveness

Probability and Statistics Curriculum Pacing Guide

Learning From the Past with Experiment Databases

AUTHOR COPY. Techniques for cold-starting context-aware mobile recommender systems for tourism

Guidelines for Project I Delivery and Assessment Department of Industrial and Mechanical Engineering Lebanese American University

A Version Space Approach to Learning Context-free Grammars

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Executive Guide to Simulation for Health

STA 225: Introductory Statistics (CT)

Read the passage above. What does Chief Seattle believe about owning land?

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Following the Freshman Year

Softprop: Softmax Neural Network Backpropagation Learning

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

CS 100: Principles of Computing

ENEE 302h: Digital Electronics, Fall 2005 Prof. Bruce Jacob

Applications of data mining algorithms to analysis of medical data

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

SELF: CONNECTING CAREERS TO PERSONAL INTERESTS. Essential Question: How Can I Connect My Interests to M y Work?

Constraining X-Bar: Theta Theory

STRETCHING AND CHALLENGING LEARNERS

Using dialogue context to improve parsing performance in dialogue systems

Preparing for the oral. GCSEs in Arabic, Greek, Japanese & Russian

INSTRUCTIONAL FOCUS DOCUMENT Grade 5/Science

Foothill College Summer 2016

Scientific Method Investigation of Plant Seed Germination

The stages of event extraction

Using focal point learning to improve human machine tacit coordination

Self Study Report Computer Science

A Genetic Irrational Belief System

SCORING KEY AND RATING GUIDE

Transcription:

Inductive Learning and Decision Trees Doug Downey EECS 349 Winter 2014 with slides from Pedro Domingos, Bryan Pardo

Outline Announcements Homework #1 assigned Have you completed it? Inductive learning Decision Trees 2

Outline Announcements Homework #1 assigned Have you completed it? Inductive learning Decision Trees 3

Instances E.g. Four Days, in terms of weather: Sky Temp Humid Wind Water Forecast sunny warm normal strong warm same sunny warm high strong warm same rainy cold high strong warm change sunny warm high strong cool change

Functions Days on which my friend Aldo enjoys his favorite water sport INPUT OUTPUT Sky Temp Humid Wind Water Forecast f(x) sunny warm normal strong warm same 1 sunny warm high strong warm same 1 rainy cold high strong warm change 0 sunny warm high strong cool change 1 5

Inductive Learning! Predict the output for a new instance INPUT OUTPUT Sky Temp Humid Wind Water Forecast f(x) sunny warm normal strong warm same 1 sunny warm high strong warm same 1 rainy cold high strong warm change 0 sunny warm high strong cool change 1 rainy warm high strong cool change? 6

General Inductive Learning Task DEFINE: Set X of Instances (of n-tuples x = <x 1,..., x n >) E.g., days decribed by attributes (or features): Sky, Temp, Humidity, Wind, Water, Forecast Target function f : X Y, e.g.: EnjoySport X Y = {0,1} HoursOfSport X Y = {0, 1, 2, 3, 4} InchesOfRain X Y = [0, 10] GIVEN: Training examples D FIND: examples of the target function: <x, f(x)> A hypothesis h such that h(x) approximates f(x).

Another example: continuous attributes Learn function from x = (x 1,, x d ) to f (x) {0, 1} given labeled examples (x, f (x))? x 2 x 1

Hypothesis Spaces Hypothesis space H is a subset of all f : X Y e.g.: Linear separators Conjunctions of constraints on attributes (humidity must be low, and outlook!= rain) Etc. In machine learning, we restrict ourselves to H The subset aspect turns out to be important

Examples Credit Risk Analysis X: Properties of customer and proposed purchase f (x): Approve (1) or Disapprove (0) Disease Diagnosis X: Properties of patient (symptoms, lab tests) f (x): Disease (if any) Face Recognition X: Bitmap image f (x):name of person Automatic Steering X: Bitmap picture of road surface in front of car f (x): Degrees to turn the steering wheel

When to use? Inductive Learning is appropriate for building a face recognizer It is not appropriate for building a calculator You d just write a calculator program Question: What general characteristics make a problem suitable for inductive learning?

Think/Pair/Share What general characteristics make a problem suitable for inductive learning? Think Start End 12

Think/Pair/Share What general characteristics make a problem suitable for inductive learning? Pair Start End 13

Think/Pair/Share What general characteristics make a problem suitable for inductive learning? Share 14

Appropriate applications Situations in which: There is no human expert Humans can perform the task but can t describe how The desired function changes frequently Each user needs a customized f

Outline Announcements Homework #1 assigned Inductive learning Decision Trees 16

Task: Will I wait for a table? 17

18 Decision Trees!

Expressiveness of D-Trees 19

A learned decision tree 20

Inductive Bias To learn, we must prefer some functions to others Selection bias use a restricted hypothesis space, e.g.: linear separators 2-level decision trees Preference bias use the whole concept space, but state a preference over concepts, e.g.: Lowest-degree polynomial that separates the data shortest decision tree that fits the data 21

Decision Tree Learning (ID3) 22

Recap Inductive learning Goal: generate a hypothesis a function from instances described by attributes to an output using training examples. Requires inductive bias a restricted hypothesis space, or preferences over hypotheses. Decision Trees Simple representation of hypotheses, recursive learning algorithm Prefer smaller trees! 23

Choosing an attribute 24

Think/Pair/Share How should we choose which attribute to split on next? Think Start End 25

Think/Pair/Share How should we choose which attribute to split on next? Pair Start End 26

Think/Pair/Share How should we choose which attribute to split on next? Share 27

Information 28

H(V) Entropy The entropy H(V) of a Boolean random variable V as the probability of V = 0 varies from 0 to 1 29 P(V=0)

Using Information 30

Measuring Performance 31

What the learning curve tells us 32

Overfitting

Overfitting is due to noise Sources of noise: Erroneous training data concept variable incorrect (annotator error) Attributes mis-measured Much more significant: Irrelevant attributes Target function not realizable in attributes

Irrelevant attributes If many attributes are noisy, information gains can be spurious, e.g.: 20 noisy attributes 10 training examples Expected # of different depth-3 trees that split the training data perfectly using only noisy attributes: 13.4

Not realizable In general: We can t measure all the variables we need to do perfect prediction. => Target function is not uniquely determined by attribute values

Not realizable: Example Humidity EnjoySport 0.90 0 0.87 1 0.80 0 0.75 0 0.70 1 0.69 1 0.65 1 0.63 1 Decent hypothesis: Humidity > 0.70 No Otherwise Yes Overfit hypothesis: Humidity > 0.89 No Humidity > 0.80 ^ Humidity <= 0.89 Yes Humidity > 0.70 ^ Humidity <= 0.80 No Humidity <= 0.70 Yes

Avoiding Overfitting Approaches Stop splitting when information gain is low or when split is not statistically significant. Grow full tree and then prune it when done 39

Effect of Reduced Error Pruning 41

Cross-validation

C4.5 Algorithm Builds a decision tree from labeled training data Generalizes simple ID3 tree by Prunes tree after building to improve generality Allows missing attributes in examples Allowing continuous-valued attributes 43

Rule post pruning Used in C4.5 Steps 1. Build the decision tree 2. Convert it to a set of logical rules 3. Prune each rule independently 4. Sort rules into desired sequence for use 44

Other Odds and Ends Unknown Attribute Values?

Odds and Ends Unknown Attribute Values? Continuous Attributes?

Decision Tree Boundaries 50

Decision Trees Bias How to solve 2-bit parity: Two step look-ahead, or Split on pairs of attributes at once For k-bit parity, why not just do k-step look ahead? Or split on k attribute values? =>Parity functions are among the victims of the decision tree s inductive bias.

Take away about decision trees Used as classifiers Supervised learning algorithms (ID3, C4.5) Good for situations where Inputs, outputs are discrete We think the true function is a small tree 53