!"#$%#&'()$*#+','()#$(-+,./01)

Similar documents
Python Machine Learning

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Lecture 1: Machine Learning Basics

Learning From the Past with Experiment Databases

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

CSL465/603 - Machine Learning

Laboratorio di Intelligenza Artificiale e Robotica

Acquisition vs. Learning of a Second Language: English Negation

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Laboratorio di Intelligenza Artificiale e Robotica

Probability and Statistics Curriculum Pacing Guide

Evolutive Neural Net Fuzzy Filtering: Basic Description

EMC Publishing s C est à toi! Level 3, 2 nd edition Correlated to the Oregon World Language Content Standards

CS Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Using dialogue context to improve parsing performance in dialogue systems

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Knowledge Transfer in Deep Convolutional Neural Nets

Lecture 1: Basic Concepts of Machine Learning

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Disambiguation of Thai Personal Name from Online News Articles

Issues in the Mining of Heart Failure Datasets

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

STA 225: Introductory Statistics (CT)

Exemplar for Internal Achievement Standard French Level 1

Probability Therefore (25) (1.33)

A Case Study: News Classification Based on Term Frequency

Probability estimates in a scenario tree

Chapter 2 Rule Learning in a Nutshell

Applications of data mining algorithms to analysis of medical data

Evolution of Symbolisation in Chimpanzees and Neural Nets

For Jury Evaluation. The Road to Enlightenment: Generating Insight and Predicting Consumer Actions in Digital Markets

Semi-Supervised Face Detection

PROJECT 1 News Media. Note: this project frequently requires the use of Internet-connected computers

Assignment 1: Predicting Amazon Review Ratings

1. Share the following information with your partner. Spell each name to your partner. Change roles. One object in the classroom:

Paper: Collaborative Information Behaviour of Engineering Students

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

9779 PRINCIPAL COURSE FRENCH

(Sub)Gradient Descent

Practical Research. Planning and Design. Paul D. Leedy. Jeanne Ellis Ormrod. Upper Saddle River, New Jersey Columbus, Ohio

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Rule Learning With Negation: Issues Regarding Effectiveness

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Name of Course: French 1 Middle School. Grade Level(s): 7 and 8 (half each) Unit 1

Using Web Searches on Important Words to Create Background Sets for LSI Classification

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Using focal point learning to improve human machine tacit coordination

Multi-label classification via multi-target regression on data streams

Forget catastrophic forgetting: AI that learns after deployment

Activity Recognition from Accelerometer Data

Australian Journal of Basic and Applied Sciences

Model Ensemble for Click Prediction in Bing Search Ads

Approaches for analyzing tutor's role in a networked inquiry discourse

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

The Lexicalization of Acronyms in English: The Case of Third Year E.F.L Students, Mentouri University- Constantine

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

What Different Kinds of Stratification Can Reveal about the Generalizability of Data-Mined Skill Assessment Models

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Evidence for Reliability, Validity and Learning Effectiveness

Stopping rules for sequential trials in high-dimensional data

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Question 1 Does the concept of "part-time study" exist in your University and, if yes, how is it put into practice, is it possible in every Faculty?

How to Judge the Quality of an Objective Classroom Test

Rule Learning with Negation: Issues Regarding Effectiveness

Machine Learning and Development Policy

Linking Task: Identifying authors and book titles in verbose queries

CAVE LANGUAGES KS2 SCHEME OF WORK LANGUAGE OVERVIEW. YEAR 3 Stage 1 Lessons 1-30

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Calibration of Confidence Measures in Speech Recognition

Memory-based grammatical error correction

Computerized Adaptive Psychological Testing A Personalisation Perspective

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Teachers response to unexplained answers

Agenda Montreal, Quebec October 17 19

arxiv: v1 [cs.lg] 15 Jun 2015

An Empirical and Computational Test of Linguistic Relativity

Using computational modeling in language acquisition research

Word Segmentation of Off-line Handwritten Documents

A Version Space Approach to Learning Context-free Grammars

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

STAT 220 Midterm Exam, Friday, Feb. 24

Human Emotion Recognition From Speech

B. How to write a research paper

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Cooperative evolutive concept learning: an empirical study

Transcription:

Questions!"#$%#&'()$*#+','()#$(-+,./01) Since induction is fallible, it is necessary to be able to assess its reliability!! Typical questions:! AgroParisTech! What is the true performance of my (learned) classification rule Is my learning algorithm better than this other one? (based in part on Sebastian Thrun CMU class " and on the tutorial of Padraic Cunningham at ECML-09)! Evaluating ML algorithms 2 Outline 1. Measuring the error rate 2. Confusion matrices and various performance criteria!1&0#&'()./*).+%*)*++-+)+#.*) 3. The ROC curve Evaluating ML algorithms 3 Evaluating ML algorithms 4

Evaluating classification rules Various sets of data The whole available data set Large data sample Very small data sample Learning set Validation set Test set Illimited sample Evaluating ML algorithms 5 Evaluating ML algorithms 6 Asymptotic behaviour (ideal case) Over-fitting (over-learning) Erreur erreur sur base de test Sur-apprentissage! Useful for very large data sets! erreur sur base d'apprentissage Arrêt de l'apprentissage t Evaluating ML algorithms 7 Evaluating ML algorithms 8

Over-fitting (NNs) Why using a test set?! The control parameters of the learning algorithm E.g.: number of hidden layers, number of neurons,... Are tuned in order to reduce the error on the validation set ))2-%+3*1)4-%+)5)666)*7*04$*1)! In order to have a non optimistically biased estimate of the error, one must measure it on an independent data set: the test set!!"#$%&'(!)#$%!*!+++!','-).'(!/! Evaluating ML algorithms 9 Evaluating ML algorithms 10 Evaluating classification rules Evaluating the error rate! True error:! (Real risk) e =! y " f ( x, # ) p( x, y) dx, y D D D = the true distribution! Test error:! (Empirical risk) 1 eˆ S =! y # f ( x, $ ) m x, y " ST A lot Few m = # of test examples T = test data Evaluating ML algorithms 11 Evaluating ML algorithms 12

Example: Confidence intervals! We want to estimate error D (h).!! The learned hypothesis incorrectly classifies 12 out of 40 examples in the test set T.! Q : What will be the true error rate?! R :???! We estimate it by using error T (h) which follows a binomial law! With mean! And standard error!!! They are estimated using the normal law with: Mean: Standard deviation: Evaluating ML algorithms 13 Evaluating ML algorithms 14 Confidence intervals Confidence intervals! The normal law!! The normal law! With probability N%, the true error error D lies in the interval:! N% 50% 68% 80% 90% 95% 98% 99% z N 0.67 1.0 1.28 1.64 1.96 2.33 2.58 Evaluating ML algorithms 15 Evaluating ML algorithms 16

Confidence intervals (cf. Mitchell 97) Example: If T contains m examples independently sampled m! 30 Then With probability 95%, the true error e D is within: eˆ S ± 1.96 eˆ S (1! eˆ S ) m! The learned hypothesis incorrectly classifies 12 out of 40 test examples in T.! Q: What will be the true error on unseen examples?! A: With 95% confidence, the true error will lie within [0.16;0.44] " eˆ S ± 1.96 eˆ S (1! eˆ S ) m m = 40 12 eˆ ˆ eˆ S (1 " es ) S = = 0.3 1.96! 0. 14 40 m Evaluating ML algorithms 17 Evaluating ML algorithms 18 95% confidence intervals Performance curves 95% confidence intervals Erreur de test Erreur d apprentissage Evaluating ML algorithms 19 Evaluating ML algorithms 20

Evaluating learned hypotheses Various sets Data Lot of data Few Learning test " error Evaluating ML algorithms 21 Evaluating ML algorithms 22 Small data sets: a dilemma Small data sets: a dilemma Evaluating ML algorithms 23 Evaluating ML algorithms 24

Cross validation (k-fold) Data Learn on yellow, test on rose " error 1 Learn on yellow, test on rose " error 2 Learn on yellow, test on rose " error 3 k-way split Learn on yellow, test on rose " error 4 Learn on yellow, test on rose " error 5 Learn on yellow, test on rose " error 6 The leave-one-out procedure Data! Low bias! Highvariance! Tends to underestimate the error if the data are not fully i.i.d. Learn on yellow, test on rose " error 7 Learn on yellow, test on rose " error 8 [Guyon & Elisseeff, jmlr, 03]! error = # error i / k Evaluating ML algorithms 25 Evaluating ML algorithms 26 The Bootstrap estimate Problem Data! The calculation of the confidence interval supposes the independence of the estimations.! But our estimations are not independent. # " Learn on yellow, test on rose " error " Repeat and compute the mean Estimation of the true risk for the final h Mean of the risks On the k test samples Mean of the risk on whole data set Evaluating ML algorithms 27 Evaluating ML algorithms 28

Types of performance criteria 2-'8%1,-')0#.+,9*1) #':)"#+,-%1)4*+8-+0#'9*)9+,.*+,#) Evaluating ML algorithms 29 Evaluating ML algorithms 30 Confusion matrix Confusion matrix 14% of the butterflies are recognized as fishes Réel! Estimé! +! -! +! VP! FP! -! FN! VN! Evaluating ML algorithms 31 Evaluating ML algorithms 32

Types of performance criteria Types of performance criteria Evaluating ML algorithms 33 Evaluating ML algorithms 34 Types of performance criteria Types of performance criteria Evaluating ML algorithms 35 Evaluating ML algorithms 36

Types of performance measures Performance measures! Sensitivity! VP FN + VP! Recall! VP VP + FN! Specificity! VN VN + FP! Precision! VP VP + FP Réel! Estimé! +! -! +! VP! FP! -! FN! VN! Evaluating ML algorithms 37 Evaluating ML algorithms 38 Performance measures Performance measures! FN-rate! FN VP + FN! FP-rate! FP FP + VN! F-measure! 2 x recall x precision Recall + precision = 2 VP 2 VP + FP + FN Réel! Estimé! +! -! +! VP! FP! -! FN! VN! Evaluating ML algorithms 39 Evaluating ML algorithms 40

Performance measures Performance measures Evaluating ML algorithms 41 Evaluating ML algorithms 42 Performance measures H/*)IJ2)9%+"*)!!!!!!!!!!!!"#$%! &'()#! *++,! -.,! (--:) 6;<=>) 6;?5@) 3#:) 6;56>) 6;A<A) B+*9,1,-'C(--:D)E)=F)G)5A5)E)6;@5<) Evaluating ML algorithms 43 Evaluating ML algorithms 44

The ROC curve Types of errors Evaluating ML algorithms 45 Evaluating ML algorithms 46 The ROC curve The ROC curve ROC = Receiver Operating Characteristic Probabilité de la classe Classe '+' Faux négatifs Vrais positifs Probabilité de la classe Classe '-' Classe '+' (10%) (90%) Critère de décision Probabilité de la classe Classe '-' Vrais négatifs Faux positifs Critère de décision (50%) (50%) Evaluating ML algorithms 47 Critère de décision Evaluating ML algorithms 48

Classe '+' Faux négatifs Classe '- ' Vrais négatifs Faux positifs Vrais positifs Critère de décision Critère de décision Classe '+' Faux négatifs (10%) Classe '- ' Vrais négatifs (50%) (50%) Faux positifs Vrais positifs (90%) Critère de décision Critère de décision The ROC curve The ROC curve PROPORTION DE VRAIS NEGATIFS 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0 1,0 1,0 0,9 0,9 PROPORTION DE VRAIS POSITIFS 0,8 0,7 0,6 0,5 0,4 0,3 Courbe ROC (pertinence = 0,90) Ligne de hasard (pertinence = 0,5) 0,8 0,7 0,6 0,5 0,4 0,3 PROPORTION DE FAUX NEGATIFS 0,2 0,2 0,1 0,1 0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0 0 PROPORTION DE FAUX POSITIFS Evaluating ML algorithms 49 Evaluating ML algorithms 50 The ROC curve The ROC curve PROPORTION DE VRAIS NEGATIFS PROPORTION DE VRAIS NEGATIFS 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0 1,0 1,0 1,0 1,0 0,9 0,9 0,9 Seuil "laxiste" 0,9 PROPORTION DE VRAIS POSITIFS 0,8 0,7 0,6 0,5 0,4 0,3 Courbe ROC (pertinence = 0,90) Ligne de hasard (pertinence = 0,5) 0,8 0,7 0,6 0,5 0,4 0,3 PROPORTION DE FAUXNEGATIFS PROPORTION DE VRAIS POSITIFS 0,8 0,7 0,6 0,5 0,4 0,3 Seuil "sévère" Probabilité delaclase Probabilité delaclase Probabilité delaclase Probabilité delaclase 0,8 0,7 0,6 0,5 0,4 0,3 PROPORTION DE FAUXNEGATIFS 0,2 0,2 0,2 0,2 0,1 0,1 0,1 0,1 0 0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0 0 0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0 PROPORTION DE FAUX POSITIFS PROPORTION DE FAUX POSITIFS Evaluating ML algorithms 51 Evaluating ML algorithms 52

The ROC curve Comparaison of learning algorithms! Résumé!! Comparison on a single data sets [Dietterich, 1998] recommends using: 5 x 2 cross-validation Paired t-test The McNemar test on a validation set! Comparison on multiples (different) data sets [Demsar, 2006] recommends using: Wilcoxon Signed Ranks Test The Friedman test Evaluating ML algorithms 53 Evaluating ML algorithms 54 Résumé Specific problems! Attention à votre fonction de coût : qu est-ce qui importe pour la mesure de performance?! Données en nombre fini: calculez les intervalles de confiance! Données rares : Attention à la répartition entre données d apprentissage et données test. Validation croisée.! N oubliez pas l ensemble de validation! The distribution of the classes is very unbalanced (e.g. 1% ou 1%O for one of the two classes)! Gray zone (uncertain labels)! Multi-valued functions! L évaluation est très importante Ayez l esprit critique Convainquez-vous vous même! Evaluating ML algorithms 55 Evaluating ML algorithms 56

Other evaluation criteria References! Intelligibility of the learned decision function E.g. SVMs or boosting are not good! Performances in generalization Often not correlated to the previous performance criterion! Various costs Data preparation Computational cost Cost of the ML expertise Cost of the domain expertise! Dietterich, T. G., (1998). Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation, 10 (7) 1895-1924.!! JapKowicz N. & Shah M. (2011). Evaluating Learning Algorithms. A classification perspective. Cambridge University Press, 2011. (An interesting book)! Evaluating ML algorithms 57 Evaluating ML algorithms 58 The Weka ML toolkit The Weka ML toolkit! http://www.cs.waikato.ac.nz/m!weka/" Evaluating ML algorithms 59 Evaluating ML algorithms 60