Compacting Instances: Creating models

Similar documents
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

On-Line Data Analytics

(Sub)Gradient Descent

CS Machine Learning

Diagnostic Test. Middle School Mathematics

Understanding Fair Trade

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Written by Wendy Osterman

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Answer Key For The California Mathematics Standards Grade 1

Answer each question by placing an X over the appropriate answer. Select only one answer for each question.

The Evolution of Random Phenomena

How to make successful presentations in English Part 2

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

Financing Public Colleges and Universities in an Era of State Fiscal Constraints

TRENDS IN. College Pricing

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Mathematics process categories

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Calibration of Confidence Measures in Speech Recognition

Rule Learning With Negation: Issues Regarding Effectiveness

Northern Kentucky University Department of Accounting, Finance and Business Law Financial Statement Analysis ACC 308

Model Ensemble for Click Prediction in Bing Search Ads

Name Class Date. Graphing Proportional Relationships

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

A Case Study: News Classification Based on Term Frequency

Radius STEM Readiness TM

Learning From the Past with Experiment Databases

Experience a Rotary Leadership Institute!

WHY GRADUATE SCHOOL? Turning Today s Technical Talent Into Tomorrow s Technology Leaders

Analyzing the Usage of IT in SMEs

Python Machine Learning

Name: Class: Date: ID: A

TUESDAYS/THURSDAYS, NOV. 11, 2014-FEB. 12, 2015 x COURSE NUMBER 6520 (1)

About the College Board. College Board Advocacy & Policy Center

Lecture 1: Machine Learning Basics

A. What is research? B. Types of research

Universidade do Minho Escola de Engenharia

PowerTeacher Gradebook User Guide PowerSchool Student Information System

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Trends in College Pricing

The stages of event extraction

Rule Learning with Negation: Issues Regarding Effectiveness

Softprop: Softmax Neural Network Backpropagation Learning

Eduroam Support Clinics What are they?

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Multimedia Application Effective Support of Education

Lecture 1: Basic Concepts of Machine Learning

CS177 Python Programming

Firms and Markets Saturdays Summer I 2014

WASHINGTON COLLEGE SAVINGS

Using Proportions to Solve Percentage Problems I

Trends in Student Aid and Trends in College Pricing

Interpreting ACER Test Results

Chapter 2 Rule Learning in a Nutshell

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Financing Education In Minnesota

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Communities in Schools of Virginia

Grade 6: Correlated to AGS Basic Math Skills

A Domain Ontology Development Environment Using a MRD and Text Corpus

Daily Common Core Ela Warm Ups

Houghton Mifflin Online Assessment System Walkthrough Guide

Pre-Algebra A. Syllabus. Course Overview. Course Goals. General Skills. Credit Value

4.0 CAPACITY AND UTILIZATION

arxiv: v1 [cs.lg] 3 May 2013

Memory-based grammatical error correction

New Jersey Society of Radiologic Technologists Annual Meeting & Registry Review

Infrared Paper Dryer Control Scheme

Value Creation Through! Integration Workshop! Value Stream Analysis and Mapping for PD! January 31, 2002!

Setting Up Tuition Controls, Criteria, Equations, and Waivers

An Empirical and Computational Test of Linguistic Relativity

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

The Good Judgment Project: A large scale test of different methods of combining expert predictions

How to Judge the Quality of an Objective Classroom Test

6 Financial Aid Information

Economics 201 Principles of Microeconomics Fall 2010 MWF 10:00 10:50am 160 Bryan Building

Learning Disability Functional Capacity Evaluation. Dear Doctor,

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

BUSINESS OCR LEVEL 2 CAMBRIDGE TECHNICAL. Cambridge TECHNICALS BUSINESS ONLINE CERTIFICATE/DIPLOMA IN R/502/5326 LEVEL 2 UNIT 11

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

GENERAL SERVICES ADMINISTRATION Federal Acquisition Service Authorized Federal Supply Schedule Price List. Contract Number: GS-00F-063CA

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

On the Polynomial Degree of Minterm-Cyclic Functions

Shockwheat. Statistics 1, Activity 1

TENNESSEE S ECONOMY: Implications for Economic Development

AP Statistics Summer Assignment 17-18

Introduction to Questionnaire Design

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

South Carolina English Language Arts

AQUA: An Ontology-Driven Question Answering System

Lesson 12. Lesson 12. Suggested Lesson Structure. Round to Different Place Values (6 minutes) Fluency Practice (12 minutes)

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4

INTERMEDIATE ALGEBRA Course Syllabus

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

NATIONAL SURVEY OF STUDENT ENGAGEMENT (NSSE)

Transcription:

Decision Trees

Compacting Instances: Creating models Food Chat Speedy Price Bar BigTip (3) (2) (2) (2) (2) 1 great yes yes adequate no yes 2 great no yes adequate no yes 3 mediocre yes no high no no 4 great yes yes adequate yes yes

Decision Tree Example: BigTip 4 yes great Speedy yes yes no Price adequate 2 Food mediocre high no no 3 1 yikes no default Food Chat Speedy Price Bar BigTip (3) (2) (2) (2) (2) 1 great yes no high no no 2 great no no adequate no yes 3 mediocre yes no high no no 4 great yes yes adequate yes yes

Decision Tree Example: BigTip great Food yikes 1 2 4 yes/no mediocre no 3 no default Food Chat Speedy Price Bar BigTip (3) (2) (2) (2) (2) 1 great yes no high no no 2 great no no adequate no yes 3 mediocre yes no high no no 4 great yes yes adequate yes yes

Decision Tree Example: BigTip great Food mediocre Speedy no yes yes yes/no 4 1 2 no 3 yikes no default Food Chat Speedy Price Bar BigTip (3) (2) (2) (2) (2) 1 great yes no high no no 2 great no no adequate no yes 3 mediocre yes no high no no 4 great yes yes adequate yes yes

Decision Tree Example: BigTip 4 yes great Speedy yes yes 2 no Price adequate Food mediocre high no no 1 3 yikes no default Food Chat Speedy Price Bar BigTip (3) (2) (2) (2) (2) 1 great yes no high no no 2 great no no adequate no yes 3 mediocre yes no high no no 4 great yes yes adequate yes yes

Top-Down Induction of DT (simplified) Training Data: TDIDT(D,c def ) IF(all examples in D have same class c) Return leaf with class c (or class c def, if D is empty) ELSE IF(no attributes left to test) Return leaf with class c of majority in D ELSE Pick A as the best decision attribute for next node FOR each value v i of A create a new descendent of node D {(x, y) D:attributeA of x has value v } i D {(x 1, y ),,(x, y Subtree t i for v i is TDIDT(D i,c def ) RETURN tree with A as root and t i as subtrees 1 n n )} i

Example: Text Classification Task: Learn rule that classifies Reuters Business News Class +: Corporate Acquisitions Class -: Other articles 2000 training instances Representation: Boolean attributes, indicating presence of a keyword in article 9947 such keywords (more accurately, word stems ) LAROCHE STARTS BID FOR NECO SHARES Investor David F. La Roche of North Kingstown, R.I., said he is offering to purchase 170,000 common shares of NECO Enterprises Inc at 26 dlrs each. He said the successful completion of the offer, plus shares he already owns, would give him 50.5 pct of NECO's 962,016 common shares. La Roche said he may buy more, and possible all NECO shares. He said the offer and withdrawal rights will expire at 1630 EST/2130 gmt, March 30, 1987. + - SALANT CORP 1ST QTR FEB 28 NET Oper shr profit seven cts vs loss 12 cts. Oper net profit 216,000 vs loss 401,000. Sales 21.4 mln vs 24.9 mln. NOTE: Current year net excludes 142,000 dlr tax credit. Company operating in Chapter 11 bankruptcy.

Decision Tree for Corporate Acq. vs = 1: - vs = 0: export = 1: export = 0: rate = 1: stake = 1: + stake = 0: debenture = 1: + debenture = 0: takeover = 1: + takeover = 0: file = 0: - file = 1: share = 1: + share = 0: - and many more Total size of tree: 299 nodes Note: word stems expanded for improved readability.

20 Questions I choose a number between 1 and 1000 You try to find it using yes/no questions Which question is more informative? Is the number 634? Is the number a prime? Is the number smaller than 500?

Should we wait?

Maximum Separation

Example: TDIDT Training Data D: Which is the best decision variable? A=F, B=S, C=P

TDIDT Example

Picking the Best Attribute to Split Ockham s Razor: All other things being equal, choose the simplest explanation Decision Tree Induction: Find the smallest tree that classifies the training data correctly Problem Finding the smallest tree is computationally hard Approach Use heuristic search (greedy search)

Maximum information Information in a set of choices E.g. Information in a flip of a fair coin Information in an unfair (99:1) coin: I(1/100, 99/100) = 0.08 Information in full classification of (p,n) samples

Maximum information After classification by attribute A Information Gain by attribute A

Information gain Which attribute has higher information gain? A=Type B=Patrons C=Neither

Learning curve Success as function of training set size A hard problem will have a: A-Steep B-Shallow learning curve

Continuous variables? Look for optimal split point. age < 40 40 young ancient

From: http://www.gisdevelopment.net/technology/rs/images/ma06110_8.jpg

Continuous output? Regression trees age < 40 40 y=age/2 y=age-20

Spurious attributes? Cross validation Information gain ratio Normalize information gain by the net information in the attribute itself

Datapoint Weighting How can we give certain datapoints more importance than others? Introduce weight factors in knn What about decision trees? Duplicate points Give more weight when choosing attributes

Ensemble learning Boosting: Create multiple classifiers that vote Give more weight to wrongly classified samples E.g. sum of incorrectly classified weights equals sum of correctly classified

Ensemble learning If the input algorithm L is a weak algorithm (>50%), then AdaBoost will return a perfect algorithm for large enough M