Cse352 Lecture Notes Classification Introduction. Professor Anita Wasilewska Computer Science Department Stony Brook University

Similar documents
Lecture 1: Machine Learning Basics

CS Machine Learning

Rule Learning With Negation: Issues Regarding Effectiveness

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Python Machine Learning

Rule Learning with Negation: Issues Regarding Effectiveness

Word Segmentation of Off-line Handwritten Documents

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

A Case Study: News Classification Based on Term Frequency

Chapter 2 Rule Learning in a Nutshell

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Lecture 1: Basic Concepts of Machine Learning

Generative models and adversarial training

Australian Journal of Basic and Applied Sciences

(Sub)Gradient Descent

Applications of data mining algorithms to analysis of medical data

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Learning From the Past with Experiment Databases

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Axiom 2013 Team Description Paper

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Speech Recognition at ICSI: Broadcast News and beyond

Reducing Features to Improve Bug Prediction

Laboratorio di Intelligenza Artificiale e Robotica

Laboratorio di Intelligenza Artificiale e Robotica

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Mining Student Evolution Using Associative Classification and Clustering

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Mining Association Rules in Student s Assessment Data

Knowledge-Based - Systems

K-Medoid Algorithm in Clustering Student Scholarship Applicants

Calibration of Confidence Measures in Speech Recognition

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Learning Methods in Multilingual Speech Recognition

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Artificial Neural Networks written examination

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Probability and Game Theory Course Syllabus

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Human Emotion Recognition From Speech

INPE São José dos Campos

CSL465/603 - Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Online Updating of Word Representations for Part-of-Speech Tagging

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

A Vector Space Approach for Aspect-Based Sentiment Analysis

Content-free collaborative learning modeling using data mining

An Online Handwriting Recognition System For Turkish

Universidade do Minho Escola de Engenharia

Probability and Statistics Curriculum Pacing Guide

Abstractions and the Brain

Beyond the Pipeline: Discrete Optimization in NLP

How to Judge the Quality of an Objective Classroom Test

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Modeling function word errors in DNN-HMM based LVCSR systems

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Speech Emotion Recognition Using Support Vector Machine

Spring 2016 Stony Brook University Instructor: Dr. Paul Fodor

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes

arxiv: v2 [cs.cv] 30 Mar 2017

Switchboard Language Model Improvement with Conversational Data from Gigaword

Modeling user preferences and norms in context-aware systems

A Version Space Approach to Learning Context-free Grammars

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

AQUA: An Ontology-Driven Question Answering System

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

A Comparison of Two Text Representations for Sentiment Analysis

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Responsible Conduct of Research Workshop Series, Scientific Communications and Authorship -- October 13,

Effective Instruction for Struggling Readers

Geo Risk Scan Getting grips on geotechnical risks

Welcome to. ECML/PKDD 2004 Community meeting

Lesson 12. Lesson 12. Suggested Lesson Structure. Round to Different Place Values (6 minutes) Fluency Practice (12 minutes)

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Issues in the Mining of Heart Failure Datasets

The stages of event extraction

Modeling function word errors in DNN-HMM based LVCSR systems

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Machine Learning and Development Policy

Conference Presentation

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Linking Task: Identifying authors and book titles in verbose queries

Best Colleges Main Survey

GRADUATE PROGRAM Department of Materials Science and Engineering, Drexel University Graduate Advisor: Prof. Caroline Schauer, Ph.D.

Interactive Whiteboard

Cooperative evolutive concept learning: an empirical study

Transcription:

Cse352 Lecture Notes Classification Introduction Professor Anita Wasilewska Computer Science Department Stony Brook University 1

PART 1: ) Classifica(on Classification = Supervised Learning Building a Classifier PART 2: Classification Algorithms (Models, Basic Classifiers) PART 3: Classification by Association PART 4: Other Classifica(on Methods

Part 1: Classification Introduction Supervised learning = Classification Data format: training and test data Class definitions and class descriptions Rules learned: characteristic and discriminant Classification process = building a classifier

Part 1: Classification Supervised learning = Classification Building a Classifier: Training and Testing Evaluating predictive accuracy the most common methods Unsupervised learning= Clustering

Classification Algorithms (Models, Basic Classifiers) Part 2: Decision Trees (ID3, C4.5) descriptive Neural Networks- statistical Bayesian Networks - statistical Rough Sets - descriptive Genetic Algorithms descriptive or statistical- but mainly an optimization method Part 3: Classification by Association - descriptive

Part 3: Other Classifica(on Methods k- nearest neighbor classifier Case- based reasoning Support Vector Machines Fuzzy sets approaches

Classifica(on Data Format Classifica(on Data Format: a data table with key a7ribute removed A special aeribute, called a class a7ribute must be dis(nguished The values of the class a7ribute are called class labels The class labels are discrete- valued and unordered. Class a7ributes are categorical in that each value serves as a category, or a class

Classifica(on Data Format The records in the classifica(on data are called data tuples with their associated class labels It means that we dis(nguish in a record its a7ribute part and class part The a7ribute part is called data tuple, or aeribute vector, data vector, sample, example, instance, data point (with associate label)

Classifica(on Data Example Example: Data Table with class a7ribute C Rec a1 a2 a3 a4 C o1 1 1 m g c1 o2 0 1 v g c2 o3 1 0 m b c1 This data consists of tuples (examples, instances): o1= (1, 1, m, g) with the class label c1 o2= (0, 1, v, g) with the class label c2 o3 =(1, 0, m, b) with the class label c1

Classification Data 1 Classifica(on Data Format: a data table with key a7ribute removed. Special a7ribute, called a class a7ribute is: buys_computer age income student credit_rating buys_computer <=30 high no fair no <=30 high no excellent no 30 40 high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no 31 40 low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes 31 40 medium no excellent yes 31 40 high yes fair yes >40 medium no excellent no

Classifica(on Data 2 (with objects) rec Age Income Student Credit_rating Buys_computer r1 <=30 High No Fair No r2 <=30 High No Excellent No r3 31 40 High No Fair Yes r4 >40 Medium No Fair Yes r5 >40 Low Yes Fair Yes r6 >40 Low Yes Excellent No r7 31 40 Low Yes Excellent Yes r8 <=30 Medium No Fair No r9 <=30 Low Yes Fair Yes r10 >40 Medium Yes Fair Yes r11 <-=30 Medium Yes Excellent Yes r12 31 40 Medium No Excellent Yes r13 31 40 High Yes Fair Yes r14 >40 Medium No Excellent No

Class defini(ons Syntac(cally a class is defined by the class aeribute c and its value v Semantically a class is defined as a subset of records A description of a class C defined by the class attribute c and its value v is written as : c=v Seman(cally, classes C1, C2.. are sets of all records for which the class aeribute c has a value v1, v2, respec(vely, i.e. C1 ={ r: c=v1}, C2 ={ r: c=v2},..

Class and Class Description Example: Set of records C = { r1, r2, r6, r8, r14} of the classification Data 2 on the previous slide is a class defined by the class attribute buys_computer and its value no The class C = { r1, r2, r6, r8, r14} description is: buys_computer= no because C = {r: buys_computer= no } C = { r1, r2, r6, r8, r14} is a class defined by the class description buys_computer= no

Class characteristics Characteristics of a class C ={ r: c=v} is a set of a non-class attributes a1, a2, ak and their respective values v1, v2,. vk such that the intersection of the set of all records for which a1=v1 & a2=v2&..ak=vk with the set C is not empty Characteristics of the class C are written as a1=v1 & a2=v2&..ak=vk REMARK: A class C can have many characteristics, i.e many characteristic descriptions

Characteristic Descriptions Definition: A formula a1=v1 & a2=v2&..ak=vk is called a characteristic description for a class C={ r: c= v } If and only if {r: a1=v1 & a2=v2&..ak=vk } /\ C = not empty set i.e. {r: a1=v1 & a2=v2&..ak=vk } /\ {r: c= v } = not empty set

Characteristic Descriptions Example: given classifica(on Data 1, 2 Some of the characteris(c descrip(ons of the class C with descrip(on: buys_computer= no are Age=<= 30 & income=high & student=no & credit_ra(ng=fair Age=>40& income=medium & student=no & credit_ra(ng=excellent Age=>40& income=medium Age=<= 30 student=no & credit_ra(ng=excellent

Characteristic Descriptions A formula Income=low is a characteris(c descrip(on of the class C1 with descrip(on: buys_computer= yes and of the class C2 with descrip(on: buys_computer= no A formula Age<=30 & Income=low is NOT a characteris(c descrip(on of the class C2 = {r: buys_computer=no } because: { r: Age<=30 & Income=low } /\ {r: buys_computer=no }= emptyset

Characteris(c Formula Any formula of a form IF class descrip(on THEN characteris(cs is called a characteris(c formula Example: : given classifica(on Data 1, 2 IF buys_computer= no THEN income = low & student=yes & credit=excellent IF buys_computer= no THEN income = low & credit=fair

Characteris(c Rule A characteris(c formula: IF class descrip(on THEN characteris(cs is called a characteris(c rule (for a given database) if and only if it is TRUE in the given database, i.e. {r: class descrip(on} /\{r: characteris(cs} = not emptyset

Classification Data 1 Classifica(on Data Format: a data table with key a7ribute removed. Special a7ribute, called a class a7ribute is buys_computer age income student credit_rating buys_computer <=30 high no fair no <=30 high no excellent no 30 40 high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no 31 40 low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes 31 40 medium no excellent yes 31 40 high yes fair yes >40 medium no excellent no

Characteris(c Rule EXAMPLE: : given classifica(on Data 1, 2 The formula IF buys_computer= no THEN income = low & student=yes & credit=excellent Is a characteris(c rule for our database because {r: buys_computer= no } = {r1,r2, r6, r8, r16 } {r: income = low & student=yes & credit=excellent } = {r6,r7} and {r1,r2, r6, r8, r16 } /\ {r6,r7} = not empty set

Characteris(c Rule EXAMPLE: : given classifica(on Data 1, 2 The formula IF buys_computer= no THEN income = low & credit=fair IS NOT a characteris(c rule for our database because {r: buys_computer= no } = {r1,r2, r6, r8, r16 } {r: income = low & credit=fair} = {r5, r9 } and {r1,r2, r6, r8, r16 } /\ {r5,r9} = empty set

Discrimina(on Discrimina(on is the process which aim is to find rules that allow us to discriminate the objects (records) belonging to a given class from the rest of records ( classes) If characteris(cs then class Example : given classifica(on Data 1, 2 If Age=<= 30 & income=high & student=no & credit_ra(ng=fair then buys_computer= no

Discriminant Formula Discriminant Formula Defini(on A discriminant formula is any formula If characteris0cs then class Example: : given classifica(on Data 1, 2 IF Age=>40 & inc=low THEN buys_comp= no

Discriminant Rule Discriminant Rule Defini(on A discriminant formula If characteris(cs then class is a DISCRIMINANT RULE (in a given database) If and only if 1. {r: characteris(c} is a non empty set 2. {r: characteris(c} {r: class}

Discriminant Rule Example: : given classifica(on Data 1, 2 A discriminant formula IF Age=>40 & inc=low THEN buys_comp= no is NOT a discriminant rule in our data base because {r: Age=>40 & inc=low} = {r5, r6} is not a subset of the set {r :buys_comp= no}= {r1,r2,r6,r8,r14}

Characteris(c and discriminant rules The inverse implica(on to the characteris(c rule is usually NOT a discriminant rule Example: the inverse implica(on to the chracteris(c rule: If buys_computer= no then income = low & student=yes & credit=excellent is If income = low & student=yes & credit=excellent then buys_computer= no The above rule is NOT a discriminant rule as it can t discriminate between classes with descrip(on buys_computer= no and buys_computer= yes (see records r7 and r8 in our Data 2)

Supervised Learning Goal (1) Given a data set and a class C defined in a given classifica(on dataset Supervised Learning Goal is to FIND a minimal set (or as small as possible set) of characteris(c and/or discriminant rules, or other descrip(ons of the class C, or of (all) other classes When we find RULES we talk about The Descrip(ve Supervised Learning

Supervised Learning Goal (2) We also want the found rules to involve as few aeributes as it is possible It means that we want the rules to have as short as possible length of the descrip(ons

Supervised Learning The process of CREATING (learning) discriminant and/or characteris(c rules, or other descrip(ons and TESTING them is called a supervised learning process When the process (look at the Learning process slide) is finished we say that the classifica(on has been learned and tested from examples (records in the classifica(on dataset) It is called supervised learning because we know the class labels of all data examples

The Learning Process (LP) TESTING AND EVALUATION LEARNING knowledge CLEANING Preprocessing Transformed data Rules or Descriptions SELECTION Processed Data Target data Data 31

Classification Data 1 Classifica(on Data Format: a data table with key a7ribute removed. Special a7ribute, called a class a7ribute is buys_computer age income student credit_rating buys_computer <=30 high no fair no <=30 high no excellent no 30 40 high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no 31 40 low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes 31 40 medium no excellent yes 31 40 high yes fair yes >40 medium no excellent no

A small, full set DISCRIMINANT RULES for classes: buys_comp=yes, buys_comp=no The rules are: IF age = <=30 AND student = no THEN buys_computer = no IF age = <=30 AND student = yes THEN buys_computer = yes IF age = 31 40 THEN buys_computer = yes IF age = >40 AND credit_ra5ng = excellent THEN buys_computer = no IF age = <=30 AND credit_ra5ng = fair THEN buys_computer = yes Exercise: verify that they all are in the Data1,2

Tes(ng and Classifying In order to use discovered rules for tes(ng, and later, when tes(ng is finished and predic(ve accuracy is acceptable to use them for future classifica(on we write rules in a following predicate form: IF age( x, <=30) AND student(x, no) THEN buys_computer (x, no) IF age(x, <=30) AND student (x, yes) THEN buys_computer (x, yes) A7ributes and their values of a new record x are matched with the IF part of the rule and the record is classified accordingly to the THEN part of the rule

Tes(ng and Training The Test Dataset has the same format as the Training Dataset, i.e. In both datasets the values of class aeribute are known Test Dataset and Training Dataset are disjoint sets We use the Test Dataset to evaluate the predic(ve accuracy of our discovered set of rules

Predic(ve accuracy PREDICTIVE ACCURACY of the set of rules, or any other result of a classifica(on algorithm is a percentage of well classified data in the Test Dataset If the predic(ve accuracy is not high enough we chose a different training and tes(ng datasets and start learning process again There are many methods of training and tes(ng and they will be discussed later

Classifica(on Data Classifica(on Data Format: a data table with key a7ribute removed. Special a7ribute, called a class a7ribute must be dis(nguished. The values: C1, C2,...Cn of the class atrribute C are called class labels Exercise: for the database below write 2 discriminant rules and 3 characteris(c rules and PROVE them to be what you claim Obj a1 a2 a3 a4 C o1 1 1 m g c1 o2 0 1 v g c2 o3 1 0 m b c1

Classifica(on and Classifiers An algorithm (model, method) is called a classifica(on algorithm if it uses the classifica(on data to build a set of pa7erns: discriminant and /or characteris(c rules or other paeern descrip(ons These pa7erns are structured in such a way that we can use them to classify unknown sets of objects: unknown tuples, records

Classifica(on and Classifiers For the reason that we can use discovered pa7erns to classify unknown sets of objects a classifica(on algorithm is omen called shortly a classifier Remember that the name classifier implies more than just a classifica(on algorithm A classifier is a final product of a process that uses data set and a classifica(on algorithm

Building a Classifier Building a classifier consists of two phases: training and tes(ng In both phases we use training data set and disjoint with it test data set for both of which the class labels are known for all of the records

Building a Classifier We use the training data set to create pa7erns: rules, trees, or to train a Neural or Bayesian network We evaluate created pa7erns with the use of test data The measure for a trained classifier is called predic(ve accuracy The classifier is build i.e. we terminate the process if it has been trained and tested and the predic(ve accuracy is on an acceptable level

Classifiers Predic(ve Accuracy PREDICTIVE ACCURACY of a classifier is a percentage of well classified data in the test data set PREDICTIVE ACCURACY depends heavily on a choice of the test and training data sets There are many methods of choosing test and and training sets and hence evalua(ng the predic(ve accuracy Basic methods are presented in Tes(ng Classifiers lecture

Correctly and Not Correctly Classified Records A record is correctly classified if and only if the following condi(ons hold: (1) we can classify the record, i.e. there is a rule such that its LEFT side matches the record, (2) classifica(on determined by the rule is correct, i.e. the RIGHT side of the rule matches the value of the record s class aeribute OTHERWISE the record is not correctly classified Words used: not correctly = incorrectly = misclassified

Exercise 1 Assume that we have a following set of rules: R1: a1=1 /\ a2= 0 => class= yes R2: a1=0 /\ a2=3 => class=no R3: a2=1 => class=yes The TEST data has the following 6 records, where the aeributes are a1, a2, class r1 = (1, 0 )- record, ( yes) associated class label, r2 = (0, 3) (yes), r3 = (1, 1) (no), r4 = (2, 1) ( yes), r5= (3, 1) (yes), r6 = (1, 2) (no) WRITE the rules in predicate form and CALCULATE the Predic(ve Accuracy of this set of rules with respect to the above TEST data of 6 records above

Exercise 2 Evaluate the Predic(ce Accuracy of the set of rules: R1: IF age = <=30 AND student = no THEN buys_computer = no R2: IF age = <=30 AND student = yes THEN buys_computer = yes R3: IF age = 31 40 THEN buys_computer = yes R4: IF age = >40 AND credit_ra5ng = excellent THEN buys_computer = no R5: IF age = <=30 AND credit_ra5ng = fair THEN buys_computer = yes with respect to the TEST data on the next slide. REMARK: you must FIRST re- write the rules in predicate form

TEST DATA for Example 2 rec Age Income Student Credit_rating Buys_computer r1 <=30 Low No Fair yes r2 <=30 High yes Excellent No r3 <=30 High No Fair Yes r4 31 40 Medium yes Fair Yes r5 >40 Low Yes Fair Yes r6 >40 Low Yes Excellent yes r7 31 40 High Yes Excellent Yes r8 <=30 Medium No Fair No r9 31 40 Low no Excellent Yes r10 >40 Medium Yes Fair Yes

Predic(ve Accuracy For our 10 TEST records and 5 rules R1, R2 R5 Record r1 is well classified by rule R5 Record r2 is misclassified Record r3 is well classified by rule R5 Record r4 is well classified by rule R5 Record r5 is misclassified Record r6 is misclassified Record r7 is well classified by rule R3 Record r8 is well classified by rule R1 Record r9 is well classified by rule R3 Record r10 is misclassified We have 6 correctly classified records out of 10 Predic(ve accuracy is 60% Exercise: prove that rules R1, R2 R5 are TRUE in the Classifica(on Data 1, 2

Classifica(on Process : a Classifier Book slide Training Data Classification Algorithms NAME RANK YEARS TENURED Mike Assistant Prof 3 no Mary Assistant Prof 7 yes Bill Professor 2 yes Jim Associate Prof 7 yes Dave Assistant Prof 6 no Anne Associate Prof 3 no Classifier (Model, Rules) IF rank = professor THEN tenured = yes IF years > 6, THEN tenured = yes

Tes(ng and Predic(on Book Slide Classifier Testing Data Unseen Data (Jeff, Professor, 4) NAME RANK YEARS TENURED Tom Assistant Prof 2 no Merlisa Associate Prof 7 no George Professor 5 yes Joseph Assistant Prof 7 yes Tenured?

Supervised vs. Unsupervised Learning Supervised learning (classifica(on) Supervision: The training data (observa(ons, measurements, etc.) are accompanied by labels indica(ng the class of the observa(ons. New data is classified based on a tested classifier

Supervised vs. Unsupervised Learning Unsupervised learning (clustering) The class labels of training data are unknown We are given a set of records (measurements, observa(ons, etc. ) with the aim of establishing the existence of classes or clusters in the data