Lecture 12: Classification 2 2009-04-29 Patrik Malm Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
2 Reading instructions Chapters for this lecture 12.1 12.2 in Gonzales-Woods
3 Intelligence The ability to separate the relevant information from a background of irrelevant details. The ability to learn from examples and generalize the knowledge so that it can be used in new situations. The ability to draw conclusions from incomplete information.
4 Classification/Recognition We want to create an intelligent system that can draw conclusions from our image data. No classification, recognition or interpretation is possible without some kind of knowledge.
5 Some important concepts (again) Arrangements of descriptors are often called patterns. Descriptors are often called features. The most common pattern arrangement is a feature vector with n-dimensions. Patterns are placed in classes of objects which share common properties. A collection of W classes are denoted 1, 2,..., W
6 Some important concepts (again) 1 2 3 Feature vector
7 Scatter plots Perimeter (Circumference) A good way to illustrate relationships between features. Radius
8 Scatter plots Example: 3-dimensional plot
9 Scatter plots Example: RGB color image.
10 Feature selection The goal in feature selection (which is a prerequisite for ALL kinds of classification) is to find a limited set of features that can discriminate between the classes. Adding features without verification will most likely NOT improve the result.
11 Feature selection Some examples Limited separation between classes Good separation between classes
Object-wise and pixel-wise classification (revisit) Object-wise classification Uses shape, size, mean intensity, mean color etc. to describe patterns. Pixel-wise classification Uses intensity, color, texture, spectral information etc. 12
Object-wise and pixel-wise classification (revisit) Shape Object-wise Texture Pixel-wise 13
14 Classification based on texture Intensity image No spectral information Use features for pixel-wise classification based on neighboring pixel values texture Create additional artificial layers that contain information about the pixel neighborhood Filtered image versions Shifted image versions
15 Classification based on texture Layer 1: Original Layer 2: Shift 2 in x Original Training areas ML classification Relaxed result Layer 3: Shift 2 in y Layer 4: Shift 2 in x,y Yellow: Open areas Green: Forest Orange: Cloud Red: Shadow
16 Relaxation Used in pixel-wise classification to reduce noise Uses a majority filter Neighborhood size determines the amount of relaxation
17 Classification methods Machine learning techniques Supervised learning Unsupervised learning Reinforcement learning...
18 Classification methods As covered in this course Supervised methods Box classifier Bayes classifiers Maximum likelihood Minimum distance Unsupervised methods Clustering k-means clustering Hierarchical clustering Neural networks
19 Supervised classification Objects/pixels belonging to a known class are used for training of the system and drawing decision lines between classes. New objects/pixels are classified using the decision lines. First apply knowledge, then classify
20 Unsupervised classification Assume that objects lying close to each other in the feature space belong to the same class. Order feature vectors into natural clusters representing the classes. After clustering : Compare with reference data Identify the classes First classify, then apply knowledge
21 Bayesian classifiers Based on a priori knowledge of class probability Cost of errors Combination gives an optimum statistical classifier (in theory) Assumptions to simplify classifier Maximum likelihood (ML) classifier Minimum distance (MD) classifier
22 Maximum likelihood classifier Classify according to the greatest probability (taking variance and covariance into consideration) Assume that the distribution within each class is Gaussian The distribution within each class can be described by a mean vector and a covariance matrix
23 Minimum distance classifier Each class is represented by its mean vector Training is done using the objects/pixels of known class and calculate the mean of the feature vectors for the objects within each class New objects are classified by finding the closest mean vector
24 Artificial Neural Networks (ANNs) Create a classifier by adaptive development of coefficients for decisions found via training. Do not assume a normal (Gaussian) probability distribution. Simulate the association of neurons in the brain. Can draw decision borders in feature space that are more complicated than hyper quadratics. Require careful training
25 Perceptron model A single perceptron is a linear classifier
26 Perceptron model
27 Neural networks Multilayer feed-forward network
28 Decision regions
29 Learning Learning rules Batch update weights after all examples Online update weights after each example Common training algorithm is backpropagation Overfitting The classifier adapts to noise or other errors in the training examples. The classifier fails the generalize from the examples
30 About trained (supervised) systems The features should be based on their ability to separate the classes Addition of new features may lead to decreased performance The training data should be much larger than the number of features Linearly dependent features should be avoided
31 Unsupervised systems (clustering) k-means Top down approach (divisive) Predetermined number of clusters Tries to find natural centers in the data Result difficult to illustrate for more than 3 dimensions Hierarchical Most often bottom up approach (agglomerative) Merges patterns until all are one class Lets the user decide which clusters are natural Illustrates results through histograms
32 k-means Tries to minimize some type of error criterion Squared error the most common The number of clusters k needs to be known Often starts from a random guess Stops when some type of criterion is fulfilled Squared error for the clustering (C) of a pattern set (P)
33 k-means Algorithm 1. Choose k cluster centers to coincide with k randomly chosen patterns or k randomly defined points inside the hypervolume containing the pattern set. 2. Assign each pattern to the closest cluster center. 3. Recompute the cluster centers using the current cluster memberships. 4. If convergence criterion is unfulfilled go to 2
34 k-means Example
35 k-means Problems Local minima not unlikely Repeat algorithm with several startpoint configurations Number of clusters needs to be known Run several different k and compare results based on some kind of measure
36 Hierarchical clustering Each pattern starts as its own cluster Clusters are joined in pairs based on their proximity Distance is dependent on which linkage is used Single linkage: Shortest distance Complete linkage: Furthest distance
37 Hierarchical clustering Linkage types
38 Hierarchical clustering Algorithm 1. Compute proximity matrix containing the distance between each pair of patterns. Treat each pattern as a cluster 2. Find the most similar pair of clusters using the proximity matrix. Merge these two clusters into one cluster. Update the proximity matrix to reflect this merge operation 3. If all patterns are in one cluster, stop. Otherwise, go to step 2.
39 Hierarchical clustering Dendrogram
40 Distance measures The results of the clustering methods are heavily dependent on the distance measure used. Euclidean Manhattan/City block Chessboard Mahalanobis
41 Reading instructions Chapters for next lecture: Chapter 4 in Gonzales-Woods