Lecture 12: Classification

Similar documents
Python Machine Learning

Artificial Neural Networks written examination

Lecture 1: Machine Learning Basics

Issues in the Mining of Heart Failure Datasets

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

Probabilistic Latent Semantic Analysis

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Generative models and adversarial training

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Learning Methods for Fuzzy Systems

INPE São José dos Campos

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Modeling function word errors in DNN-HMM based LVCSR systems

Evolutive Neural Net Fuzzy Filtering: Basic Description

CSL465/603 - Machine Learning

Modeling function word errors in DNN-HMM based LVCSR systems

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Lecture 1: Basic Concepts of Machine Learning

A study of speaker adaptation for DNN-based speech synthesis

Axiom 2013 Team Description Paper

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Probability and Statistics Curriculum Pacing Guide

CS Machine Learning

Softprop: Softmax Neural Network Backpropagation Learning

WHEN THERE IS A mismatch between the acoustic

Human Emotion Recognition From Speech

Mathematics Success Grade 7

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Multiplication of 2 and 3 digit numbers Multiply and SHOW WORK. EXAMPLE. Now try these on your own! Remember to show all work neatly!

Speaker Identification by Comparison of Smart Methods. Abstract

Dublin City Schools Mathematics Graded Course of Study GRADE 4

SARDNET: A Self-Organizing Feature Map for Sequences

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Statewide Framework Document for:

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

A survey of multi-view machine learning

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

arxiv: v2 [cs.cv] 30 Mar 2017

Using focal point learning to improve human machine tacit coordination

Evolution of Symbolisation in Chimpanzees and Neural Nets

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Algebra 2- Semester 2 Review

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Switchboard Language Model Improvement with Conversational Data from Gigaword

Spinners at the School Carnival (Unequal Sections)

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Comment-based Multi-View Clustering of Web 2.0 Items

Word Segmentation of Off-line Handwritten Documents

Measurement. When Smaller Is Better. Activity:

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Semi-Supervised Face Detection

STA 225: Introductory Statistics (CT)

Calibration of Confidence Measures in Speech Recognition

Speech Emotion Recognition Using Support Vector Machine

Learning From the Past with Experiment Databases

This scope and sequence assumes 160 days for instruction, divided among 15 units.

Alignment of Australian Curriculum Year Levels to the Scope and Sequence of Math-U-See Program

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

arxiv: v1 [cs.lg] 15 Jun 2015

Deep Neural Network Language Models

Attributed Social Network Embedding

Artificial Neural Networks

Time series prediction

Laboratorio di Intelligenza Artificiale e Robotica

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Reinforcement Learning by Comparing Immediate Reward

Why Did My Detector Do That?!

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

The Good Judgment Project: A large scale test of different methods of combining expert predictions

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Australian Journal of Basic and Applied Sciences

Math 96: Intermediate Algebra in Context

Are You Ready? Simplify Fractions

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Data Fusion Through Statistical Matching

Test Effort Estimation Using Neural Network

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Knowledge Transfer in Deep Convolutional Neural Nets

Grade 6: Correlated to AGS Basic Math Skills

Probabilistic principles in unsupervised learning of visual structure: human data and a model

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Using EEG to Improve Massive Open Online Courses Feedback Interaction

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Transcription:

Lecture 12: Classification 2 2009-04-29 Patrik Malm Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

2 Reading instructions Chapters for this lecture 12.1 12.2 in Gonzales-Woods

3 Intelligence The ability to separate the relevant information from a background of irrelevant details. The ability to learn from examples and generalize the knowledge so that it can be used in new situations. The ability to draw conclusions from incomplete information.

4 Classification/Recognition We want to create an intelligent system that can draw conclusions from our image data. No classification, recognition or interpretation is possible without some kind of knowledge.

5 Some important concepts (again) Arrangements of descriptors are often called patterns. Descriptors are often called features. The most common pattern arrangement is a feature vector with n-dimensions. Patterns are placed in classes of objects which share common properties. A collection of W classes are denoted 1, 2,..., W

6 Some important concepts (again) 1 2 3 Feature vector

7 Scatter plots Perimeter (Circumference) A good way to illustrate relationships between features. Radius

8 Scatter plots Example: 3-dimensional plot

9 Scatter plots Example: RGB color image.

10 Feature selection The goal in feature selection (which is a prerequisite for ALL kinds of classification) is to find a limited set of features that can discriminate between the classes. Adding features without verification will most likely NOT improve the result.

11 Feature selection Some examples Limited separation between classes Good separation between classes

Object-wise and pixel-wise classification (revisit) Object-wise classification Uses shape, size, mean intensity, mean color etc. to describe patterns. Pixel-wise classification Uses intensity, color, texture, spectral information etc. 12

Object-wise and pixel-wise classification (revisit) Shape Object-wise Texture Pixel-wise 13

14 Classification based on texture Intensity image No spectral information Use features for pixel-wise classification based on neighboring pixel values texture Create additional artificial layers that contain information about the pixel neighborhood Filtered image versions Shifted image versions

15 Classification based on texture Layer 1: Original Layer 2: Shift 2 in x Original Training areas ML classification Relaxed result Layer 3: Shift 2 in y Layer 4: Shift 2 in x,y Yellow: Open areas Green: Forest Orange: Cloud Red: Shadow

16 Relaxation Used in pixel-wise classification to reduce noise Uses a majority filter Neighborhood size determines the amount of relaxation

17 Classification methods Machine learning techniques Supervised learning Unsupervised learning Reinforcement learning...

18 Classification methods As covered in this course Supervised methods Box classifier Bayes classifiers Maximum likelihood Minimum distance Unsupervised methods Clustering k-means clustering Hierarchical clustering Neural networks

19 Supervised classification Objects/pixels belonging to a known class are used for training of the system and drawing decision lines between classes. New objects/pixels are classified using the decision lines. First apply knowledge, then classify

20 Unsupervised classification Assume that objects lying close to each other in the feature space belong to the same class. Order feature vectors into natural clusters representing the classes. After clustering : Compare with reference data Identify the classes First classify, then apply knowledge

21 Bayesian classifiers Based on a priori knowledge of class probability Cost of errors Combination gives an optimum statistical classifier (in theory) Assumptions to simplify classifier Maximum likelihood (ML) classifier Minimum distance (MD) classifier

22 Maximum likelihood classifier Classify according to the greatest probability (taking variance and covariance into consideration) Assume that the distribution within each class is Gaussian The distribution within each class can be described by a mean vector and a covariance matrix

23 Minimum distance classifier Each class is represented by its mean vector Training is done using the objects/pixels of known class and calculate the mean of the feature vectors for the objects within each class New objects are classified by finding the closest mean vector

24 Artificial Neural Networks (ANNs) Create a classifier by adaptive development of coefficients for decisions found via training. Do not assume a normal (Gaussian) probability distribution. Simulate the association of neurons in the brain. Can draw decision borders in feature space that are more complicated than hyper quadratics. Require careful training

25 Perceptron model A single perceptron is a linear classifier

26 Perceptron model

27 Neural networks Multilayer feed-forward network

28 Decision regions

29 Learning Learning rules Batch update weights after all examples Online update weights after each example Common training algorithm is backpropagation Overfitting The classifier adapts to noise or other errors in the training examples. The classifier fails the generalize from the examples

30 About trained (supervised) systems The features should be based on their ability to separate the classes Addition of new features may lead to decreased performance The training data should be much larger than the number of features Linearly dependent features should be avoided

31 Unsupervised systems (clustering) k-means Top down approach (divisive) Predetermined number of clusters Tries to find natural centers in the data Result difficult to illustrate for more than 3 dimensions Hierarchical Most often bottom up approach (agglomerative) Merges patterns until all are one class Lets the user decide which clusters are natural Illustrates results through histograms

32 k-means Tries to minimize some type of error criterion Squared error the most common The number of clusters k needs to be known Often starts from a random guess Stops when some type of criterion is fulfilled Squared error for the clustering (C) of a pattern set (P)

33 k-means Algorithm 1. Choose k cluster centers to coincide with k randomly chosen patterns or k randomly defined points inside the hypervolume containing the pattern set. 2. Assign each pattern to the closest cluster center. 3. Recompute the cluster centers using the current cluster memberships. 4. If convergence criterion is unfulfilled go to 2

34 k-means Example

35 k-means Problems Local minima not unlikely Repeat algorithm with several startpoint configurations Number of clusters needs to be known Run several different k and compare results based on some kind of measure

36 Hierarchical clustering Each pattern starts as its own cluster Clusters are joined in pairs based on their proximity Distance is dependent on which linkage is used Single linkage: Shortest distance Complete linkage: Furthest distance

37 Hierarchical clustering Linkage types

38 Hierarchical clustering Algorithm 1. Compute proximity matrix containing the distance between each pair of patterns. Treat each pattern as a cluster 2. Find the most similar pair of clusters using the proximity matrix. Merge these two clusters into one cluster. Update the proximity matrix to reflect this merge operation 3. If all patterns are in one cluster, stop. Otherwise, go to step 2.

39 Hierarchical clustering Dendrogram

40 Distance measures The results of the clustering methods are heavily dependent on the distance measure used. Euclidean Manhattan/City block Chessboard Mahalanobis

41 Reading instructions Chapters for next lecture: Chapter 4 in Gonzales-Woods