Data Structures. Notes for Lecture 13 Techniques of Data Mining By. Classification: Basic Concepts. 1. Classification: Definition

Similar documents
Lecture 1: Machine Learning Basics

Rule Learning With Negation: Issues Regarding Effectiveness

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CS Machine Learning

Learning Methods in Multilingual Speech Recognition

Rule Learning with Negation: Issues Regarding Effectiveness

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Assignment 1: Predicting Amazon Review Ratings

Learning From the Past with Experiment Databases

Word learning as Bayesian inference

Python Machine Learning

Switchboard Language Model Improvement with Conversational Data from Gigaword

SARDNET: A Self-Organizing Feature Map for Sequences

University of Groningen. Systemen, planning, netwerken Bosman, Aart

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

A Case Study: News Classification Based on Term Frequency

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Applications of data mining algorithms to analysis of medical data

A Version Space Approach to Learning Context-free Grammars

Mining Student Evolution Using Associative Classification and Clustering

Learning to Rank with Selection Bias in Personal Search

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Artificial Neural Networks written examination

Generative models and adversarial training

Universidade do Minho Escola de Engenharia

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Chapter 2 Rule Learning in a Nutshell

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Calibration of Confidence Measures in Speech Recognition

Learning goal-oriented strategies in problem solving

stateorvalue to each variable in a given set. We use p(x = xjy = y) (or p(xjy) as a shorthand) to denote the probability that X = x given Y = y. We al

CS 446: Machine Learning

Issues in the Mining of Heart Failure Datasets

Linking Task: Identifying authors and book titles in verbose queries

Preference Learning in Recommender Systems

FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS

Proof Theory for Syntacticians

Australian Journal of Basic and Applied Sciences

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

Build on students informal understanding of sharing and proportionality to develop initial fraction concepts.

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Reducing Features to Improve Bug Prediction

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

(Sub)Gradient Descent

Using Web Searches on Important Words to Create Background Sets for LSI Classification

CS 100: Principles of Computing

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Semi-Supervised Face Detection

Word Segmentation of Off-line Handwritten Documents

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Data Stream Processing and Analytics

Welcome to. ECML/PKDD 2004 Community meeting

Computerized Adaptive Psychological Testing A Personalisation Perspective

Lecture 1: Basic Concepts of Machine Learning

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Probabilistic Latent Semantic Analysis

Clouds = Heavy Sidewalk = Wet. davinci V2.1 alpha3

Human Emotion Recognition From Speech

An OO Framework for building Intelligence and Learning properties in Software Agents

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Conference Presentation

A Comparison of Two Text Representations for Sentiment Analysis

Model Ensemble for Click Prediction in Bing Search Ads

How to set up gradebook categories in Moodle 2.

Softprop: Softmax Neural Network Backpropagation Learning

Truth Inference in Crowdsourcing: Is the Problem Solved?

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

CSL465/603 - Machine Learning

Beyond the Pipeline: Discrete Optimization in NLP

NCEO Technical Report 27

Introduction to Causal Inference. Problem Set 1. Required Problems

Speech Recognition at ICSI: Broadcast News and beyond

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

A NEW ALGORITHM FOR GENERATION OF DECISION TREES

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

arxiv: v1 [cs.lg] 3 May 2013

Team Formation for Generalized Tasks in Expertise Social Networks

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

An Introduction to the Minimalist Program

Modeling function word errors in DNN-HMM based LVCSR systems

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

ABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Statewide Framework Document for:

Student Course Evaluation Class Size, Class Level, Discipline and Gender Bias

A Comparison of Standard and Interval Association Rules

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Generation of Attribute Value Taxonomies from Data for Data-Driven Construction of Accurate and Compact Classifiers

arxiv:cmp-lg/ v1 22 Aug 1994

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Multi-Lingual Text Leveling

Transcription:

Data Structures Notes for Lecture 13 Techniques of Data Mining By Ass.Prof.Dr.Samaher Al_Janabi 2017-2018 1. Classification: Definition Classification: Basic Concepts Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes. Goal: previously unseen records should be assigned a class as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. 2. Illustrating Classification Task 3. Classification Techniques Decision Tree based Methods Rule-based Methods Memory based reasoning 1

Neural Networks Naïve Bayes and Bayesian Belief Networks Support Vector Machines 3.1. Decision Tree Decision trees are one of the fundamental techniques used in data mining. They are tree-like structures used for classification, clustering, feature selection, and prediction. Decision trees are easily interpretable and intuitive for humans. They are well suited for high-dimensional applications. Decision trees are fast and usually produce high-quality solutions. Decision tree objectives are consistent with the goals of data mining and knowledge discovery. This lecture reviews the concept of decision trees in data mining. A decision tree consists of a root and internal nodes. The root and the internal nodes are labeled with questions in order to find a solution to the problem under consideration. The root node is the first state of a DT. This node is assigned to all of the examples from the training data. If all examples belong to the same group, no further decisions need to be made to split the data set. If the examples in this node belong to two or more groups, a test is made at the node that results in a split. A DT is binary if each node is split into two parts, and it is nonbinary (multi-branch) if each node is split into three or more parts A decision tree model consists of two parts: creating the tree and applying the tree to the database. To achieve this, decision trees use several different algorithms. The most widely-used algorithms by computer scientists are ID3, C4-5, and C5.0. Example:- 2

Another Example of Decision Tree We can constructing a decision tree from a set T of training cases as follows: Let the classes be denoted by {C\, C2,, Cn}. There are three possibilities: (i) T contains one or more cases, but all belonging to a single class Cj. The decision tree for T is a leaf identifying class Cj. (ii) T contains no cases. The decision tree is also a leaf in this case, but the class to be associated with the leaf must be determined from sources other than T. (iii) T contains cases that belong to a mixture of classes. T is partitioned into subsets T1,T2,, Tk, where Ti contains all cases in T that have outcome Oi of the chosen test. The decision tree for T consists of a decision node identifying the test, and one branch for each possible outcome. This process is applied recursively to each subset of the training cases, so that the ith branch leads to the decision tree constructed from the subset Ti of the training cases. Generally, a decision tree algorithm is most appropriate for the third case. In this case, the decision tree algorithm can be stated as follows: From the training data set, identify a target variable and a set of input variables. Examine each input variable one at a time: Create two or more groupings of the values of the input variables, and measure how similar items are within each group and how different items are between groups. Select the grouping that maximizes similarity within groupings and differences between groupings. 3

Once the groupings have been calculated for each input variable, select the single input variable that maximizes similarity within groupings and differences between groupings. This process is repeated in each group that contains a convincing percentage of information in the original data. The process is not terminated until all divisible groups have been divided. 3.1.1. ID3 Algorithm Below is the decision tree algorithm for ID3 that describes the general layout for DT algorithms. This algorithm uses gain ratio as the evaluating test. The gain criterion selects a test to maximize the mutual information between the test and the class. The process of determining the gain for a test is as follows : Imagine selecting one case at random from a set S of cases and announcing that it belongs to some class Cj. Let probability freq(cj, S) denote the frequency of class Cj cases in set S so that this message has the The information the message conveys is defined by The expected information from such a message pertaining to class membership is the sum over the classes in proportion to their frequencies in 5; that is, When applied to the set of training cases, Info(T) measures the average amount of information needed to identify the class of a case in set T. This quantity is also known as the entropy of the set T. Now consider a similar measurement after T has been partitioned (denoted by Tj) in accordance with the n outcomes of a test X. The expected information requirement is the weighted sum over the n subsets: The quantity measures the information that is gained by partitioning T in accordance to the test X. Even though the gain criterion yields good results, it has a serious deficiency it is biased towards tests with many outcomes. The bias in the gain criterion can be corrected by normalizing the apparent gain of tests. By analogy, the definition of split info is given by 4

This represents the "potential information generated by dividing T into n subsets, whereas the information gain measures the information relevant to classification that arises from the same division." Then, expresses the useful portion of the generated information by the split (that appears useful for classification). The gain ratio selects a test to maximize the ratio above, subject to the constraint that the information gain must be large at least as large as the average gain over all tests examined. ID3 Decision Tree Algorithm Given Examples (S); Target attribute (C); Attributes (R) Initialize Root Function ID3 (S,C,R) Create a Root node for the tree IF S = empty, return a single node with value Failure; IF S = C, return a single node with C; IF R = empty, return a single node with most frequent target attribute (C); ELSE BEGIN Let D be the attribute with largest Gain Ratio (D, S) among attributes in R; Let {dj\j = 1, 2,..., n} be the values of attribute D; Let {Sj\j = 1, 2,..., n} be the subsets of 5 consisting respectively of records with value dj for attribute D; Return a tree with root labeled D arcs d, d-i,..., dn going respectively to the trees; For each branch in the tree IF S = empty, add a new branch with most frequent C; ELSE ID3{S!,C,R-{D}), ID3{S2,C,R-{D}),..., ID3 (Sn, C,R-{B}) END ID3 Return Root 5

3.1.2. Decision Tree Classification Task A. Apply Model to Test Data (A) 6

(B) (C) (D) 7

(E) (F) 8