Investigating the Performance of Naive- Bayes Classifiers and K- Nearest Neighbor Classifiers

Similar documents
Lecture 1: Machine Learning Basics

A Case Study: News Classification Based on Term Frequency

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

CSL465/603 - Machine Learning

Switchboard Language Model Improvement with Conversational Data from Gigaword

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Multivariate k-nearest Neighbor Regression for Time Series data -

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Reducing Features to Improve Bug Prediction

Integrating E-learning Environments with Computational Intelligence Assessment Agents

Welcome to. ECML/PKDD 2004 Community meeting

Semi-Supervised Face Detection

Rule Learning With Negation: Issues Regarding Effectiveness

A student diagnosing and evaluation system for laboratory-based academic exercises

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Word Segmentation of Off-line Handwritten Documents

INPE São José dos Campos

Probabilistic Latent Semantic Analysis

Softprop: Softmax Neural Network Backpropagation Learning

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Speech Emotion Recognition Using Support Vector Machine

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

A study of speaker adaptation for DNN-based speech synthesis

Probability and Statistics Curriculum Pacing Guide

ATW 202. Business Research Methods

Python Machine Learning

Introduction to Simulation

COURSE SYNOPSIS COURSE OBJECTIVES. UNIVERSITI SAINS MALAYSIA School of Management

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Preference Learning in Recommender Systems

Evolutive Neural Net Fuzzy Filtering: Basic Description

Rule Learning with Negation: Issues Regarding Effectiveness

CS 446: Machine Learning

Universidade do Minho Escola de Engenharia

A Case-Based Approach To Imitation Learning in Robotic Agents

Clouds = Heavy Sidewalk = Wet. davinci V2.1 alpha3

Spring 2012 MECH 3313 THERMO-FLUIDS LABORATORY

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

(Sub)Gradient Descent

Agent-Based Software Engineering

A cognitive perspective on pair programming

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Learning From the Past with Experiment Databases

UMass at TDT Similarity functions 1. BASIC SYSTEM Detection algorithms. set globally and apply to all clusters.

Speech Recognition at ICSI: Broadcast News and beyond

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Learning Methods for Fuzzy Systems

Using dialogue context to improve parsing performance in dialogue systems

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

Axiom 2013 Team Description Paper

Lecture 10: Reinforcement Learning

Active Learning. Yingyu Liang Computer Sciences 760 Fall

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Time series prediction

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

arxiv: v1 [cs.lg] 3 May 2013

A Model of Knower-Level Behavior in Number Concept Development

A Model to Detect Problems on Scrum-based Software Development Projects

Bug triage in open source systems: a review

A Comparison of Two Text Representations for Sentiment Analysis

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

A survey of multi-view machine learning

Vision for Science Education A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas

Lecture 1: Basic Concepts of Machine Learning

Stopping rules for sequential trials in high-dimensional data

Abstractions and the Brain

Handling Concept Drifts Using Dynamic Selection of Classifiers

Using computational modeling in language acquisition research

Office Hours: Mon & Fri 10:00-12:00. Course Description

Conversational Framework for Web Search and Recommendations

Data Fusion Through Statistical Matching

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Rule-based Expert Systems

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Georgetown University at TREC 2017 Dynamic Domain Track

Managerial Decision Making

A Comparison of Standard and Interval Association Rules

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

SETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

Dimensions of Classroom Behavior Measured by Two Systems of Interaction Analysis

AQUA: An Ontology-Driven Question Answering System

Activity Recognition from Accelerometer Data

An Empirical and Computational Test of Linguistic Relativity

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Toward Probabilistic Natural Logic for Syllogistic Reasoning

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

How do adults reason about their opponent? Typologies of players in a turn-taking game

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Transcription:

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Investigating the Performance of Naive- Bayes Classifiers and K- Nearest Neighbor Classifiers Mohammed Jahirul Islam Dept. of Elec. & Comp. Engineering University of Windsor, Windsor, ON, Canada 1

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Presentation Outline Overview of Classification Problem Statement and Motivation Literature Review Bayes Classifier Naïve Bayes Classifier K- Nearest Neighbor Classifier Application of Classifiers Credit Card Approval Experimental Results Conclusion, Comments 2

3 RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Classification- Overview Goal of machine learning is to program computers to use example data or past experience to solve a given problem Classification is an application of machine learning Takes raw data and classifies it as belonging to a particular class based on required parameter set Selection of right classification algorithm for machine learning is a big issue

4 RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Classification Scheme Selection of classifier depends on the application and the information available from that application Machine learning uses the theory of statistics in building mathematical model for classification, because the core task is making inference from a sample Inference is a big deal

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Problem Statement Key Question? Is there any way to generalize the classification techniques? How to determine which technique is suitable for a specific problem? How to improve a specific classifier by changing the parameters for a specific application? Investigating the performance of the classifiers could be one solution to reach that goal. 5

6 RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Literature Review A wide range of algorithms are available for classification from Bayesian classifiers to more powerful Neural Network. Bayesian theory is basically works as a framework for making decision under uncertainty- a probabilistic approach to inference The probability of the future events could be calculated by determining the earlier frequency: To see the future, look at the past The predictions are based completely on data culled from reality The more data obtained, the better it works Bayesian models are self-correcting When data changed, so do the result

7 RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Literature Review (cont d) In classification, Bayes rule is used to calculate the probabilities of the classes and it is a big issue how to classify raw data rationally to minimize expected risk. What if the dimension of the inputs is so high? Naïve Bayes classifier is one of the mostly used practical Bayesian learning methods. Very effective when the dimensionality of the input is very high In some domains, it s performance is comparable to that of neural network K- Nearest Neighbor algorithm is the most basic instancebased method Store the training instances in look up table and interpolate from these

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Bayesian Theory Most practical learning approach for most learning problems Based on evaluating explicit probabilities for hypothesis Bayes theorem states that: P ( h D) = P( D h) P( h) P( D) P(h): Prior probability of hypothesis h- prior P(D): Prior probability of training data D- Evidence P(D h): Probability of D given h- Likelihood P(h D): Probability of h given D- Posterior probability The posterior probability of class h i is calculated and finally the best hypothesis (h MAP ) is selected- Maximum a posteriori probability 8

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Naïve Bayes Classifier It requires a small amount of training data to estimate the parameters necessary for classification Highly practical Bayesian learning method Particularly suited when dimensionality of the input is so high Assumption: The attribute values are conditionally independent given target value It ignores the possible dependencies, say correlations among input Reduce a multivariate problem to a group of univariate problems 9

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR K- Nearest Neighbor Classifier In parametric methods, we assume a model is valid for over the whole input space Practically this assumption does not hold and we may incur a large error if it does not, solution? In nonparametric estimation we assume similar inputs have similar outputs. It does not use any model to fit data Based on memory/ training data. Called instance-based/ memory based learning algorithm KNN is instance-based classifier 10

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR KNN Classifier KNN is the most basic instance-based learning method Result of new instance query is classified based on majority of KNN category Assumption: The world is so smooth and functions changes slowly. Find the similar past instances from the training set Uses suitable distance measure, k It is common to select k small and odd to break ties (typical value 1, 3, 5) Larger k values help reduce the noisy points 11

12 RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Implementation, Training and Testing Naïve Bayes and KNN classifiers are implemented to apply Credit Card Approval application. It is important for a bank/ financial institution to be able to predict in advance the risk associated with a loan The probability that the customer will default and not the whole amount back Make sure that the bank will make profit and also to not inconvenience a customer with his/her financial capacity. Usually, the information about the customer includes income, savings, collaterals, profession, age, passed financial history and so forth

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Story Behind the Datafile The source of the datafile: ftp://ftp.ics.uci.edu/pub/machine-learning-databases/creditscreening All attribute names and values have been changed to meaningless symbols to protect confidentiality of the data. Contains information about 671 applicants, whether they were approved or rejected. Each application is described by 9 attributes and classified as approved ( + ) or rejected ( - ) Each of the 9 attributes is a one letter symbol that is a shorthand for a more meaningful English language description. 13

14 RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Attributes of the Datafile Attributes A1 A2 A3 A4 A5 A6 A7 A8 A9 Value b, a u, y g, p, h i, k, c, g, q, d, a, m, x, w, j, r, e, b h, v, f, d, b, j, z, m, o t, f t, f t, f g, p, s

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Experimental Results Training set examples: 470, Testing set example: 201 Using Naïve Bayes classifier at first the tables are constructed from the attributes A1 to A9 using the training set Sample table is shown for attribute A9 Table: P(A9 accept) and P(A9 reject), Total accept: 215, Reject: 255 A9 Accept Reject g 0.9581 0.9020 p 0.0047 0.0039 s 0.0372 0.0941 15

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Experimental Results (Naïve) The test set is classified based on the probabilities estimated from training set Each example is picked from testing set and then its class is predicted The predicted class is compared to the target value that is given in the test set. If mismatch, example becomes error The classification is shown in Table, Total testing example: 201 Classification Number % Correct 176 87.57 Incorrect 25 12.43 16

17 RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Results- KNN Different values of k are tried, for example k=1, 3, 5, 11, 51, 101 Distance metric is calculated- Euclidian distance, for mismatch distance set to 1, else 0 Based on the value of k, the k number of smallest distance training examples are picked up and calculate their corresponding accept or reject The larger value is the predicted value for this testing example and it is compared to the target value. If mismatch, example becomes error

18 RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Results- KNN (Cont d) For different values of k, the testing set is classified % of error for different k using KNN, testing example: 201 K Correct (%) Incorrect (%) 1 80.10 19.90 3 85.57 14.43 5 90.55 9.45 11 86.57 13.43 51 86.07 13.93 101 85.57 14.43

19 RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Results- KNN (Cont d) For different values of k, the testing set is classified % of error for different k using KNN, testing example: 201

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Comparative Statement (KNN and Naive) Naïve Bayes and KNN classifier is compared in terms of correct classification and misclassification rate The classification is shown in Table, Total testing example: 201 Classifier Correct Classification Misclassification Number % Number % Naïve 176 87.57 25 12.43 KNN (k=5) 182 90.55 19 9.45 20

21 RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Conclusions (Bayesian) Credit card approval application is selected for investigating the performance of widely used classifiers Naïve Bayes and KNN classifier. The result of Bayesian inference depends strongly on prior probabilities. Bayes theorem provides a principled way to calculate the posterior probability of each hypothesis given the training data and select the most probable one

22 RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Conclusions (Naïve Bayes, KNN) Naïve Bayes classifier is applied to the credit card approval testing data set and found 12.43% error of classification Instance-based methods are sometimes referred to as Lazy learning methods, because they delay the process until a new instance must be classified In KNN, the selection of K is application dependent. To simplify the problem, it was fixed to odd number so that no tie can happen At K=5, the misclassification rate is 9.45% (minimum), so k=5 is the best value for the application

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR References [1] ftp://ftp.ics.uci.edu/pub/machine-learning-databases/creditscreening. [2] E. Alpaydin. An Introduction to Machine Learning., The MIT press, Cambridge, Massachusetts, London, England, 2004. [3] T. Cover and P. Hart. Nearest neighbor pattern classification, IEEE Transaction on Information Theory, 13:21 27, 1967. [4] R. Duda, P. Hart, and D. Stork. Pattern Classification, Wiley Interscience, 2nd ed. [5] S. Eyheramendy, D. Lewis, and D. Madigan On the naïve bayes model for text categorization, Proceedings Artificial Intelligence Statistics, 2003. [6] D. Lewis Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval, ATT lab Research, NJ, USA. [7] T. Mitchell. Machine Learning. McGraw-Hill. [8] I. Rish An empirical study of the naive bayes classifier, Proceedings of IJCAI-01, 2001. [9] N. Roussopoulos, S. Kelley, and F. Vincent Nearest neighbor queries, Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, 1995. [10] K. Weise and W. Woger Comparison of two measurement results using the Bayesian theory of measurement uncertainty, Measurement Science and Technology, 5:879 882, 1994. [11] Q. J. Wu. Class Notes- Machine Learning and Computer Vision. University of Windsor, Windsor, ON, Canada, 2007. 23

24 RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Thanks for your patience Questions?