CSC 4510/9010: Applied Machine Learning Rule Inference

Similar documents
CS Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Lecture 1: Basic Concepts of Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Learning From the Past with Experiment Databases

Lecture 1: Machine Learning Basics

(Sub)Gradient Descent

On-Line Data Analytics

Grade 6: Correlated to AGS Basic Math Skills

Probability and Statistics Curriculum Pacing Guide

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Rule Learning With Negation: Issues Regarding Effectiveness

Chapter 2 Rule Learning in a Nutshell

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Python Machine Learning

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Rule Learning with Negation: Issues Regarding Effectiveness

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Proof Theory for Syntacticians

Applications of data mining algorithms to analysis of medical data

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Evolutive Neural Net Fuzzy Filtering: Basic Description

TOPICS LEARNING OUTCOMES ACTIVITES ASSESSMENT Numbers and the number system

Mathematics subject curriculum

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Mathematics process categories

A Version Space Approach to Learning Context-free Grammars

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Discriminative Learning of Beam-Search Heuristics for Planning

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Issues in the Mining of Heart Failure Datasets

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

May To print or download your own copies of this document visit Name Date Eurovision Numeracy Assignment

CS 446: Machine Learning

Learning goal-oriented strategies in problem solving

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Learning Methods for Fuzzy Systems

Probability estimates in a scenario tree

Build on students informal understanding of sharing and proportionality to develop initial fraction concepts.

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

Ricopili: Postimputation Module. WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Reducing Features to Improve Bug Prediction

Team Formation for Generalized Tasks in Expertise Social Networks

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

Genevieve L. Hartman, Ph.D.

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Curriculum Design Project with Virtual Manipulatives. Gwenanne Salkind. George Mason University EDCI 856. Dr. Patricia Moyer-Packenham

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Self Study Report Computer Science

Probabilistic Latent Semantic Analysis

AQUA: An Ontology-Driven Question Answering System

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Common Core State Standards

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

Learning Methods in Multilingual Speech Recognition

Fourth Grade. Reporting Student Progress. Libertyville School District 70. Fourth Grade

ABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms

STA 225: Introductory Statistics (CT)

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

"f TOPIC =T COMP COMP... OBJ

Missouri Mathematics Grade-Level Expectations

Activity 2 Multiplying Fractions Math 33. Is it important to have common denominators when we multiply fraction? Why or why not?

A Case Study: News Classification Based on Term Frequency

Conference Presentation

Speech Emotion Recognition Using Support Vector Machine

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

SARDNET: A Self-Organizing Feature Map for Sequences

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Spring 2015 Natural Science I: Quarks to Cosmos CORE-UA 209. SYLLABUS and COURSE INFORMATION.

Using Proportions to Solve Percentage Problems I

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Mining Student Evolution Using Associative Classification and Clustering

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

This scope and sequence assumes 160 days for instruction, divided among 15 units.

Radius STEM Readiness TM

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Multi-label classification via multi-target regression on data streams

PRIMARY ASSESSMENT GRIDS FOR STAFFORDSHIRE MATHEMATICS GRIDS. Inspiring Futures

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Interactive Whiteboard

Transcription:

CSC 4510/9010: Applied Machine Learning Rule Inference Dr. Paula Matuszek Paula.Matuszek@villanova.edu Paula.Matuszek@gmail.com (610) 647-9789 CSC 4510.9010 Spring 2015. Paula Matuszek 1

Red Tape Going to use Blackboard for remaining submissions Assignment 3 should be there. Let me know if you have a problem seeing it. CSC 4510.9010 Spring 2015. Paula Matuszek 2

Grad Student Presentations Please send me email when you have your topic, but at least a week ahead of time. Feb 10 Raja Harish Vempati Feb 17 Bharadwaj Vadlamannati Feb 24 Midterm Mar 3 Spring break Mar 10 Nikhil Dasari Mar 17 Sruthi Moola Mar 24 Gopi Krishna Chitluri Mar 31 Pradeep Musku April 7 Sai Koushik Haddunoori CSC 4510.9010 Spring 2015. Paula Matuszek 3

Output: representing structural patterns Many different ways of representing patterns Decision trees, rules, instance-based, Also called knowledge representation Representation determines inference method Understanding the output is the key to understanding the underlying learning methods Different types of output for different learning problems (e.g. classification, regression, ) Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 4

Nominal and numeric attributes Nominal: number of children usually equal to number values attribute won t get tested more than once Other possibility: division into two subsets Numeric: test whether value is greater or less than constant attribute may get tested several times Other possibility: three-way split (or multi-way split) Integer: less than, equal to, greater than Real: below, within, above Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 5

Missing values Does absence of value have some significance? Yes missing is a separate value No missing must be treated in a special way Solution A: assign instance to most popular branch Solution B: split instance into pieces Pieces receive weight according to fraction of training instances that go down each branch Classifications from leave nodes are combined using the weights that have percolated to them Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 6

Simplicity first Simple algorithms often work very well! There are many kinds of simple structure, eg: One attribute does all the work All attributes contribute equally & independently A weighted linear combination might do Instance-based: use a few prototypes Use simple logical rules Success of method depends on the domain Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) 7

Classification rules Popular alternative to decision trees Antecedent (pre-condition): a series of tests (just like the tests at the nodes of a decision tree) Tests are usually logically ANDed together (but may also be general logical expressions) Consequent (conclusion): classes, set of classes, or probability distribution assigned by rule Individual rules are often logically ORed together Conflicts arise if different conclusions apply Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 8

Model spaces Decision trees Partition the instance space into axis-parallel regions, labeled with class value Nearest-neighbor classifiers Partition the instance space into regions defined by the centroid instances (or cluster of k instances) Associative rules (feature values class) (more to come!) CSC 8520 Fall, 2008. Paula Matuszek. Some sldies from M. DesJardins, www.cs.umbc.edu/671/fall01 CSC 4510.9010 Spring 2015. Paula Matuszek and http://aima.eecs.berkeley.edu/slides-ppt/m18-learning.ppt 9

Rule Induction Given Features Training examples Output for training examples Generate automatically a set of rules or a decision tree which will allow you to judge new objects Basic approach is Combinations of features become antecedents or links Examples become consequents or nodes CSC 4510.9010 Spring 2015. Paula Matuszek 10

Rule Induction Example Starting with 100 cases, 10 outcomes, 15 variables Form 100 rules, each with 15 antecedents and one consequent. Collapse rules. Cancellations: If we have C, A => B and C, A => B, collapse to A => B Drop Terms: D, E => F and D, G => F, collapse to D => F Test rules and undo collapse if performance gets worse CSC 4510.9010 Spring 2015. Paula Matuszek 11

Rose Diagnosis Yellow Leaves Wilted Leaves Brown Spots Fungus N Y Y Bugs N Y Y Nutrition Y N N Fungus N N Y Fungus Y N Y Bugs Y Y N R1: If not yellow leaves and wilted leaves and brown spots then fungus. R6: If wilted leaves and yellow leaves and not brown spots then bugs CSC 4510.9010 Spring 2015. Paula Matuszek 12

Rose Diagnosis Cases 1 and 4 have opposite values for wilted leaves, so create new rule: R7: If not yellow leaves and brown spots then fungus. KB is rules. Learner is system collapsing and test rules. Critic is the test cases. Performer is rule-based inference. Problems: Over-generalization Irrelevance Need data on all features for all training cases Computationally painful. Useful if you have enough good training cases. Output can be understood and modified by humans CSC 4510.9010 Spring 2015. Paula Matuszek 13

Inferring rudimentary rules 1R: learns a 1-level decision tree I.e., rules that all test one particular attribute Basic version One branch for each value Each branch assigns most frequent class Error rate: proportion of instances that don t belong to the majority class of their corresponding branch Choose attribute with lowest error rate (assumes nominal attributes) Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) 14

Pseudo-code for 1R For each attribute, For each value of the attribute, make a rule as follows: count how often each class appears find the most frequent class make the rule assign that class to this attribute-value Calculate the error rate of the rules Choose the rules with the smallest error rate Note: missing is treated as a separate attribute value Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) 15

Evaluating the weather attributes Outlook Temp Humidity Windy Play Sunny Sunny Overcast Rainy Rainy Rainy Overcast Sunny Sunny Rainy Sunny Overcast Overcast Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot High High High High Normal Normal Normal High Normal Normal Normal High Normal False True False False False True True False False False True True False No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes Attribute Outlook Temp Humidity Windy Rules Sunny No Overcast Yes Rainy Yes Hot No* Mild Yes Cool Yes High No Normal Yes False Yes True No* Errors Rainy Mild High True No * indicates a tie Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) 16 2/5 0/4 2/5 2/4 2/6 1/4 3/7 1/7 2/8 3/6 Total errors 4/14 5/14 4/14 5/14

Dealing with numeric attributes Discretize numeric attributes Divide each attribute s range into intervals Sort instances according to attribute s values Place breakpoints where class changes (majority class) This minimizes the total error Example: temperature from weather data 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes No Yes Yes Yes No No Yes Yes Yes No Yes Yes No Outlook Sunny Sunny Overcast Rainy Temperature 85 80 83 75 Humidity 85 90 86 80 Windy False False False Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) 17 True Play No No Yes Yes

The problem of overfitting This procedure is very sensitive to noise One instance with an incorrect class label will probably produce a separate interval Also: time stamp attribute will have zero errors Simple solution: enforce minimum number of instances in majority class per interval Example (with min = 3): 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes No Yes Yes Yes No No Yes Yes Yes No Yes Yes No 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes No Yes Yes Yes No No Yes Yes Yes No Yes Yes No Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) 18

Discussion of 1R 1R was described in a paper by Holte (1993) Contains an experimental evaluation on 16 datasets (using cross-validation so that results were representative of performance on future data) Minimum number of instances was set to 6 after some experimentation 1R s simple rules performed not much worse than much more complex decision trees Simplicity first pays off! Very Simple Classification Rules Perform Well on Most Commonly Used Datasets Robert C. Holte, Computer Science Department, University of Ottawa Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) 19

Discussion of 1R: Hyperpipes Another simple technique: build one rule for each class Each rule is a conjunction of tests, one for each attribute For numeric attributes: test checks whether instance's value is inside an interval Interval given by minimum and maximum observed in training data For nominal attributes: test checks whether value is one of a subset of attribute values Subset given by all possible values observed in training data Class with most matching tests is predicted Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4) 20

From trees to rules Easy: converting a tree into a set of rules One rule for each leaf: Antecedent contains a condition for every node on the path from the root to the leaf Consequent is class assigned by the leaf Produces rules that are unambiguous Doesn t matter in which order they are executed But: resulting rules are unnecessarily complex Pruning to remove redundant tests/rules Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 21

From rules to trees More difficult: transforming a rule set into a tree Tree cannot easily express disjunction between rules Example: rules which test different attributes If a and b then x If c and d then x Symmetry needs to be broken Corresponding tree contains identical subtrees ( replicated subtree problem ) Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3) 22

Rules Summary Multiple approaches, but the basic idea is the same: infer simple rules that make the decision based on logical combinations of attributes 1R is a good first test For simple domains the rules are easy to understand by humans Sensitive to noise, overfitting Not a good fit for complex domains, large number of attributes CSC 4510.9010 Spring 2015. Paula Matuszek 23

Examples in Weka Section 4.1 in text CSC 4510.9010 Spring 2015. Paula Matuszek 24