Decision Trees. Vibhav Gogate The University of Texas at Dallas

Similar documents
CS Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Chapter 2 Rule Learning in a Nutshell

(Sub)Gradient Descent

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Lecture 1: Machine Learning Basics

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Learning Methods in Multilingual Speech Recognition

Data Stream Processing and Analytics

Rule Learning With Negation: Issues Regarding Effectiveness

Learning goal-oriented strategies in problem solving

Lecture 1: Basic Concepts of Machine Learning

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Softprop: Softmax Neural Network Backpropagation Learning

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Rule Learning with Negation: Issues Regarding Effectiveness

CS 446: Machine Learning

Proof Theory for Syntacticians

Shockwheat. Statistics 1, Activity 1

Introduction to Simulation

Calibration of Confidence Measures in Speech Recognition

Probabilistic Latent Semantic Analysis

FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS

Prediction of Maximal Projection for Semantic Role Labeling

How do adults reason about their opponent? Typologies of players in a turn-taking game

Discriminative Learning of Beam-Search Heuristics for Planning

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Switchboard Language Model Improvement with Conversational Data from Gigaword

INPE São José dos Campos

Semi-Supervised Face Detection

A Case Study: News Classification Based on Term Frequency

Applications of data mining algorithms to analysis of medical data

Automatic Discretization of Actions and States in Monte-Carlo Tree Search

Interpreting ACER Test Results

A Version Space Approach to Learning Context-free Grammars

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

Artificial Neural Networks written examination

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

SARDNET: A Self-Organizing Feature Map for Sequences

Word learning as Bayesian inference

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Radius STEM Readiness TM

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Summary results (year 1-3)

Active Learning. Yingyu Liang Computer Sciences 760 Fall

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Using dialogue context to improve parsing performance in dialogue systems

A heuristic framework for pivot-based bilingual dictionary induction

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Team Formation for Generalized Tasks in Expertise Social Networks

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

CSL465/603 - Machine Learning

Beyond the Pipeline: Discrete Optimization in NLP

Universidade do Minho Escola de Engenharia

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

The Indices Investigations Teacher s Notes

Python Machine Learning

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Multi-Lingual Text Leveling

Modeling user preferences and norms in context-aware systems

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

NATIONAL CENTER FOR EDUCATION STATISTICS RESPONSE TO RECOMMENDATIONS OF THE NATIONAL ASSESSMENT GOVERNING BOARD AD HOC COMMITTEE ON.

On the Polynomial Degree of Minterm-Cyclic Functions

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Multi-label classification via multi-target regression on data streams

An Introduction to the Minimalist Program

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Speech Recognition at ICSI: Broadcast News and beyond

Early Warning System Implementation Guide

Learning From the Past with Experiment Databases

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

On-Line Data Analytics

re An Interactive web based tool for sorting textbook images prior to adaptation to accessible format: Year 1 Final Report

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

What is Thinking (Cognition)?

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Generating Test Cases From Use Cases

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Mining Student Evolution Using Associative Classification and Clustering

Linguistics 220 Phonology: distributions and the concept of the phoneme. John Alderete, Simon Fraser University

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Knowledge Transfer in Deep Convolutional Neural Nets

Evolutive Neural Net Fuzzy Filtering: Basic Description

Medical Complexity: A Pragmatic Theory

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Lecture 2: Quantifiers and Approximation

Comparison of network inference packages and methods for multiple networks inference

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Issues in the Mining of Heart Failure Datasets

Corrective Feedback and Persistent Learning for Information Extraction

Transcription:

Decision Trees Vibhav Gogate The University of Texas at Dallas

Recap Supervised learning Given: Training data with desired output Assumption: There exists a function f which transforms input x into output f(x). To do: find an approximation to f Classification: Output, i.e., f(x) is discrete What makes learning hard? Issues.

Notes: A discrete feature can appear only once (or not appear at all) along the unique path from the root to a leaf. Question: Can I test on Humidity with a threshold of 95? YES (it is a different discrete feature).

x2<5

x2<5

x2<5

x2<5

x2<5

Can you put a bound on the number of leaf nodes?

The following questions may arise in your mind! How to choose the best attribute? Which property to test at a node When to declare a particular node as leaf? What types of trees should we prefer, smaller, larger, balanced, etc? If a leaf node is impure (has both positive and negative classes), what should we do? What if some attribute value is missing?

Choosing the best Attribute? Fundamental principle underlying tree creation Simplicity (prefer smaller trees) Occam s Razor: Simplest model that explains the data should be preferred Each node divides the data into subsets Heuristic: Make each subset as pure as possible.

Choosing the best Attribute: Information Gain Heuristic Entropy, denoted by H is a measure of impurity Gain = Current impurity New impurity Reduction in impurity Maximize gain Second term actually gives expected entropy (weigh each bin by the amount of data in it)

50% positive and 50% negative examples All examples are negative All examples are positive

When do I play tennis?

When do I play tennis?

Decision Tree

Is the decision tree correct? Let s check whether the split on Wind attribute is correct. We need to show that Wind attribute has the highest information gain.

When do I play tennis?

Wind attribute 5 records match Note: calculate the entropy only on examples that got routed in our branch of the tree (Outlook=Rain)

Practical Issues in Decision Tree Learning Overfitting When to stop growing a tree? Handling non-boolean attributes Handling missing attribute values

Noise Sources of Overfitting Small number of examples associated with each leaf What if only one example is associated with a leaf. Can you believe it? Coincidental regularities Generalization is the most important criteria Your method should work well on examples which you have not seen before.

Avoiding Overfitting Two approaches Stop growing the tree when data split is not statistically significant Grow tree fully, then post-prune Key Issue: What is the correct tree size? Divide data into training and validation set Random noise in two sets might be different Apply statistical test to estimate whether expanding a particular node is likely to produce an improvement beyond the training set Add a complexity penalty

Rule Post Pruning Induce the decision tree using the full training set (allowing it to overfit) Convert the decision tree to a set of rules Prune each rule by removing pre-conditions that improve the estimated accuracy Estimate accuracy using a validation set Sort the rules using their estimated accuracy Classify new instances using the sorted sequence

Handling Missing Values Some attribute-values are missing Example: patient data. You don t expect blood test results for everyone. Treat the missing value as another value Ignore instances having missing values Problematic because throwing away data Assign it the most common value Assign it the most common value based on the class that the example belongs to.

Handling Missing Values: Probabilistic approach

Summary: Decision Trees Representation Tree growth Choosing the best attribute Overfitting and pruning Special cases: Missing Attributes and Continuous Attributes Many forms in practice: CART, ID3, C4.5