Decision Tree Learning. CSE 6003 Machine Learning and Reasoning

Similar documents
Lecture 1: Basic Concepts of Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS Machine Learning

Lecture 1: Machine Learning Basics

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Rule Learning With Negation: Issues Regarding Effectiveness

Python Machine Learning

Rule Learning with Negation: Issues Regarding Effectiveness

A Case Study: News Classification Based on Term Frequency

Word Segmentation of Off-line Handwritten Documents

Axiom 2013 Team Description Paper

CSL465/603 - Machine Learning

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

University of Groningen. Systemen, planning, netwerken Bosman, Aart

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

ABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Lesson 12. Lesson 12. Suggested Lesson Structure. Round to Different Place Values (6 minutes) Fluency Practice (12 minutes)

(Sub)Gradient Descent

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Switchboard Language Model Improvement with Conversational Data from Gigaword

On-Line Data Analytics

Using dialogue context to improve parsing performance in dialogue systems

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

A Version Space Approach to Learning Context-free Grammars

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Assignment 1: Predicting Amazon Review Ratings

Issues in the Mining of Heart Failure Datasets

Mining Association Rules in Student s Assessment Data

Mining Student Evolution Using Associative Classification and Clustering

A Comparison of Two Text Representations for Sentiment Analysis

Universidade do Minho Escola de Engenharia

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

AP Statistics Summer Assignment 17-18

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Introduction to Causal Inference. Problem Set 1. Required Problems

Introduction to Questionnaire Design

Welcome to. ECML/PKDD 2004 Community meeting

Calibration of Confidence Measures in Speech Recognition

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Reducing Features to Improve Bug Prediction

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

Name Class Date. Graphing Proportional Relationships

NATIONAL SURVEY OF STUDENT ENGAGEMENT (NSSE)

Machine Learning and Development Policy

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

The stages of event extraction

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Data Stream Processing and Analytics

Seminar - Organic Computing

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

Word learning as Bayesian inference

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Artificial Neural Networks written examination

Instructional Supports for Common Core and Beyond: FORMATIVE ASSESMENT

Learning Methods for Fuzzy Systems

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Speech Emotion Recognition Using Support Vector Machine

Section 7, Unit 4: Sample Student Book Activities for Teaching Listening

Student s Edition. Grade 6 Unit 6. Statistics. Eureka Math. Eureka Math

Probability and Statistics Curriculum Pacing Guide

From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

Multi-Lingual Text Leveling

CODE Multimedia Manual network version

Paper Reference. Edexcel GCSE Mathematics (Linear) 1380 Paper 1 (Non-Calculator) Foundation Tier. Monday 6 June 2011 Afternoon Time: 1 hour 30 minutes

MYCIN. The MYCIN Task

Linking Task: Identifying authors and book titles in verbose queries

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Interactive Whiteboard

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

ALL-IN-ONE MEETING GUIDE THE ECONOMICS OF WELL-BEING

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Australian Journal of Basic and Applied Sciences

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

Human Emotion Recognition From Speech

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Medical Complexity: A Pragmatic Theory

Experience College- and Career-Ready Assessment User Guide

Innovative Methods for Teaching Engineering Courses

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Level: 5 TH PRIMARY SCHOOL

Using focal point learning to improve human machine tacit coordination

CS 446: Machine Learning

Proof Theory for Syntacticians

DegreeWorks Advisor Reference Guide

Multimedia Application Effective Support of Education

Association Between Categorical Variables

In a Heartbeat Language level Learner type Time Activity Topic Language Materials

Generative models and adversarial training

Transcription:

Decision Tree Learning CSE 6003 Machine Learning and Reasoning

Outline What is Decision Tree Learning? What is Decision Tree? Decision Tree Examples Decision Trees to Rules Decision Tree Construction Decision Tree Algorithms Decision Tree Overfitting

Paradigms of Machine Learning Neural Network Machine Learning Genetic Algorithms Decision Trees Bayesian Learning Decision Tree technique is one of the machine learning techniques

Learning Types Learning Supervised Learning Classification Unsupervised Learning Clustering Decision Tree Learning Bayesian Learning Nearest Neighbour Neural Networks Support Vector Machines Regression Association Analysis Sequence Analysis Summerization Descriptive Statistics Outlier Analysis Scoring Decision Tree Learning is in the supervised learning type.

Decision Tree Learning Decision Tree Learning is a method for approximating discretevalued target functions, in which the learned function is represented by a decision tree. Decision Tree Learning is robust to noisy data and capable of learning disjunctive expressions. One of the most widely used method for inductive inference. Salary < 1 M Job = teacher Age < 30 Good Bad Bad House Hiring Good

Decision Tree Representation Decision Trees classify instances by sorting them down the tree from the root to some leaf node, which provides the classification of the instance. Each node in the tree specifies a test of some attribute of the instance Each branch descending from that node corresponds to one of the possible values for this attributes

Decision Trees Decision Tree is a tree where internal nodes are simple decision rules on one or more attributes each branch corresponds to an attribute value leaf nodes are predicted class labels Decision trees are used for deciding between several courses of action age income student credit_rating buys_computer <=30 high no fair no <=30 high no excellent no 31 40 high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no 31 40 low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes 31 40 medium no excellent yes 31 40 high yes fair yes >40 medium no excellent no Attribute Value age? <=30 31..40 >40 Classification student? yes credit rating? No Yes Excellent Fair no yes yes

Desicion Tree Applications Has been used for 1. Classification class1 class2 class1 class3 class5 class3 class1 2. Data Reduction class4 Initial attribute set: {A1, A2, A3, A4, A5, A6} A4? A1? A6? Class 1 Class 2 Class 1 Class 2 Reduced attribute set: {A1, A4, A6}

Decision Tree Example A credit card company receives thousands of applications for new cards. Each application contains information about an applicant, age marital status annual salary outstanding debts credit rating etc. Problem: to decide whether an application should approved, or to classify applications into two categories, approved and not approved.

Decision Tree Example (Cont) Approved or not

Decision Tree Example (Cont) Decision nodes and leaf nodes (classes)

Decision Tree Example (Cont) Construct a classification model from the data Use the model to classify future loan applications into Yes (approved) and No (not approved) What is the class for following case/instance?

Use the Decision Tree (Cont) No Once the tree is trained, then a new instance is classified by starting at the root and following the path as dictated by the test results for this instance.

Decision Tree Example Problem: decide whether to wait for a table at a restaurant Attributes: 1. Alternate: is there an alternative restaurant nearby? 2. Bar: is there a comfortable bar area to wait in? 3. Fri/Sat: is today Friday or Saturday? 4. Hungry: are we hungry? 5. Patrons: number of people in the restaurant (None, Some, Full) 6. Price: price range ($, $$, $$$) 7. Raining: is it raining outside? 8. Reservation: have we made a reservation? 9. Type: kind of restaurant (French, Italian, Thai, Burger) 10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)

Decision Tree Example (Cont.) Classification of examples is positive (T) or negative (F)

Decision Tree Example (Cont.) Here is the true tree for deciding whether to wait

Decision Trees to Rules

Decision Trees to Rules It is easy to derive a rule set from a decision tree Write a rule for each path in the decision tree from the root to a leaf. Can be represented as if-then rules Example: IF (Outlook=Sunny) (Humidity=High) THEN PlayTennis = No

Decision Trees to Rules

Decision Trees Construction

Decision Tree Each node tests some attribute of the instance Instances are represented by attribute-value pairs High information gain attributes close to the root Root: best attribute for classification Which attribute is the best classifier? answer based on information gain

Entropy Entropy specifies the minimum number of bits of information needed to encode the classification of an arbitrary member of S In general: m Entropy(S) p i 1 i log 2 p i Example for two class labels Entropy(S) p log p p log 1 2 1 2 2 p2

Entropy

Information Gain Measures the expected reduction in entropy given the value of some attribute A Gain (S,A) Entropy(S ) i A Si Entropy(S i) S Values(A): Set of all possible values for attribute A S i : Subset of S for which attribute A has value v

Decision Tree Example Which attribute first?

Decision Tree Example (Cont.)

Decision Tree Example (Cont.) Entropi ( S) (9/14)log 2(9/14) (5/14)log 2(5/14) 0,940 Gain(S, Outlook) = 0,246 Gain(S, Temperature) = 0,029 Gain(S, Huminity) = 0,151 Gain(S, Wind) = 0,048 ain (S, Wind ) Gain (S, Huminity S Entropy (S) S Weak S ) Entropy (S) S Entropy (S High Weak Entropy (S High S ) S Strong S ) S Normal Entropy (S Entropy (S Strong ) Normal 0,940 0,048 ) 0,940 0,151 8 14 7 14 *0,811 *0,985 6 14 7 14 *1,0 *1,0

Decision Tree Example (Cont.)

Decision Tree Construction Which attribute is next? Sunny Outlook Overcast? Yes Rain? Gain (SSunny, Wind ) 0,970 (2 / 5)1,0 (3/ 5)0,918 0,970 0,019 Gain (SSunny,Huminity ) 0,970 (3/ 5)0,0 (2 / 5)0,0 0,970 Gain (S Sunny, Temperatur e) 0,970 (2 / 5)0 (2 / 5)1 (1/ 5)0 0,570

Decision Tree Example (Cont.) [D3,D7,D12,D13] [D1,D2, D8] [D9,D11] [D4,D5,D10] [D6,D14]

Another Example At the weekend: - go shopping, - watch a movie, - play tennis or - just stay in. What you do depends on three things: - the weather (windy, rainy or sunny); - how much money you have (rich or poor) - whether your parents are visiting.

Another Example (Cont.)

Another Example height hair eyes class short blond blue + tall blond brown - tall red blue + short dark blue - tall dark blue - tall blond blue + tall dark brown - short blond brown - I(3+, 5-) = -3/8log 2 3/8 5/8log 2 5/8 = 0.954434003 Height: short (1+, 2-) tall(2+, 3-) Gain(height) = 0.954434003-3/8*I(1+,2-) - 5/8*I(2+,3-) = = 0.954434003 3/8(-1/3log 2 1/3-2/3log 2 2/3) 5/8(-2/5log 2 2/5-3/5log 2 3/5) = 0.003228944 Hair: blond(2+, 2-) red(1+, 0-) dark(0+, 3-) Gain(hair) = 0.954434003 4/8(-2/4log 2 2/4 2/4log 2 2/4) 1/8(-1/1log 2 1/1-0) -3/8(0-3/3log 2 3/3) = 0.954434003 0.5 = 0.454434003 Eyes: blue(3+, 2-) brown(0+, 3-) Gain(eyes) = 0.954434003 5/8(-3/5log 2 3/5 2/5log 2 2/5) -5/8(= = 0.954434003-0.606844122 = 0.347589881 Hair is the best attribute.

Another Example (Cont.) height hair eyes class short blond blue + tall blond brown - tall red blue + short dark blue - tall dark blue - tall blond blue + tall dark brown - short blond brown - hair dark red blond short, dark, blue: - tall, dark, blue: - tall, bark, brown: - tall, red, blue: + short, blond, blue: + tall, blond, brown: - tall, blond, blue: + short, blond, brown: - 34

Decision Trees Algorithms

Decision Tree Algorithms ID3 Quinlan (1981) Tries to reduce expected number of comparison C 4.5 Quinlan (1993) It is an extension of ID3 Just starting to be used in data mining applications Also used for rule induction CART Breiman, Friedman, Olshen, and Stone (1984) Classification and Regression Trees CHAID Kass (1980) Oldest decision tree algorithm Well established in database marketing industry QUEST Loh and Shih (1997)

Frequency Usage

Complexity of Tree Induction Assume m attributes n training instances tree depth O (log n) Building a tree O (m n log n) Total cost: O (m n log n)

Decision Tree Adv. DisAdv. Positives (+) + Reasonable training time + Fast application + Easy to interpret + Rule extraction from trees (can be re-represented as if-then-else rules) + Easy to implement + Can handle large number of features + Does not require any prior knowledge of data distribution Negatives (-) - Cannot handle complicated relationship between features - Problems with lots of missing data - Output attribute must be categorical - Limited to one output attribute - Difficulties involving in design an optimal decision tree - Overlap especially when the number of classes is large