PATTERN RECOGNITION Introduction; Delimiting the territory

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Lecture 1: Machine Learning Basics

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Python Machine Learning

Lecture 1: Basic Concepts of Machine Learning

Probabilistic Latent Semantic Analysis

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Generative models and adversarial training

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Learning Methods for Fuzzy Systems

Artificial Neural Networks written examination

Word Segmentation of Off-line Handwritten Documents

CSL465/603 - Machine Learning

AQUA: An Ontology-Driven Question Answering System

Speech Emotion Recognition Using Support Vector Machine

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

(Sub)Gradient Descent

Human Emotion Recognition From Speech

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Speech Recognition at ICSI: Broadcast News and beyond

A Case Study: News Classification Based on Term Frequency

Assignment 1: Predicting Amazon Review Ratings

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Rule Learning With Negation: Issues Regarding Effectiveness

A study of speaker adaptation for DNN-based speech synthesis

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Missouri Mathematics Grade-Level Expectations

University of Groningen. Systemen, planning, netwerken Bosman, Aart

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Seminar - Organic Computing

Speaker Identification by Comparison of Smart Methods. Abstract

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Software Maintenance

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse

Evolution of Symbolisation in Chimpanzees and Neural Nets

Semi-Supervised Face Detection

Statewide Framework Document for:

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Applications of data mining algorithms to analysis of medical data

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Lecture 10: Reinforcement Learning

Probability and Statistics Curriculum Pacing Guide

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

Time series prediction

Data Fusion Through Statistical Matching

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Welcome to. ECML/PKDD 2004 Community meeting

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

COMPUTER-AIDED DESIGN TOOLS THAT ADAPT

Australian Journal of Basic and Applied Sciences

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

A Reinforcement Learning Variant for Control Scheduling

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

A survey of multi-view machine learning

Evolutive Neural Net Fuzzy Filtering: Basic Description

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Reducing Features to Improve Bug Prediction

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Extending Place Value with Whole Numbers to 1,000,000

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Grade 6: Correlated to AGS Basic Math Skills

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Modeling function word errors in DNN-HMM based LVCSR systems

INPE São José dos Campos

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Physics 270: Experimental Physics

CS 446: Machine Learning

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Laboratorio di Intelligenza Artificiale e Robotica

A Case-Based Approach To Imitation Learning in Robotic Agents

Learning From the Past with Experiment Databases

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

CS Machine Learning

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Learning Disability Functional Capacity Evaluation. Dear Doctor,

Using dialogue context to improve parsing performance in dialogue systems

Abstractions and the Brain

A Metacognitive Approach to Support Heuristic Solution of Mathematical Problems

Arizona s College and Career Ready Standards Mathematics

An Online Handwriting Recognition System For Turkish

Axiom 2013 Team Description Paper

Rule Learning with Negation: Issues Regarding Effectiveness

Cal s Dinner Card Deals

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Transcription:

PATTERN RECOGNITION Introduction; Delimiting the territory Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských partyzánů 1580/3, Czech Republic http://people.ciirc.cvut.cz/hlavac, vaclav.hlavac@cvut.cz also Center for Machine Perception, http://cmp.felk.cvut.cz Courtesy: M.I. Schlesinger, V. Franc Outline of the talk: Global picture, epistemology. Modeling and system theory approach. Pattern recognition, learning. Statistical structural PR. Bayesian formulation. What has been known in PR?

What is pattern recognition? Epistemology a branch of philosophy dealing with the origin, nature, methods and scope of cognition/knowledge. Pattern recognition is one of methods. Pattern recognition / Machine learning (almost synonyms) is a scientific discipline that constructs and studies algorithms that learn from data by building a statistical model and use it for making decisions or predictions. Pattern recognition is the assignment of a physical object or event to one of several prespecified categories the book by Duda & Hart 1977, 2001. 2/28 A pattern is an object, process or event that can be given a name. A pattern class (or category) is a set M X of elements (patterns) sharing common attributes, i.e. finite recognizable characteristics (features). Classification (or recognition) assigns given objects to prescribed classes. A classifier is a machine (program) which performs classification.

A pattern class, examples (1) Set of all syntactically correct arithmetic expression like, e.g. 3/28 2x(a + 3b) 6y + (x y)/7 M is a subset of the set X of all finite strings over an alphabet. M can be described by a context-free grammar. Set of all binary valued images containing non-overlapping and non-touching one pixel wide rectangular frames. M is a subset of the set X of all rectangular binary valued images.

A pattern class, examples (2) 4/28 Set of all dogs in images. Poděkování: Boris Flach.

Basic concepts, an illustration 5/28 A pattern is studied (a potato in our example, see the illustration). A feature vector x X is a vector of observations (measurements). Vector x constitutes a single point in the feature (vector) space X. A hidden state (class label in a special case) y Y cannot be measured directly. Pattern with equal hidden states belong to the same class. The task is to design a classifier (a decision rule) q: X Y assigning a pattern instance into a hidden state. hidden state (or class label) y feature vector x x = pattern x 1 x 2 x n

Pattern recognition, A motivating example Object (situation) is described by two parameters: x observable feature (also observation). y hidden parameter (state, special case a class). 6/28 Example statistical PR: jockeys and and basketballists. x 2 - height [cm] jockeys basketballists x 1 - weight [kg]

The overall picture, components 7/28 - ROC analysis - Cross validation - Bootraping R e a l w o r l d Observations - Sensors - Cameras - Databases Data preprocessing - Data normalization - Noise filtration - Feature extraction Statistical model selection Dimensionality reduction - Feature selection - Feature projection Decision or Prediction from data - Classification - Regression - Clustering - Formal description Selected model Decision result Input: Data, training (multi)-set. Statistical models and their are parameters learned empirically from the training data. Outputs: diverse decision; see the diagram.

Classification is an old scientific problem 8/28 The nature of classification and decision had been a central theme in the discipline of philosophical epistemology, the study of the nature of knowledge. The foundations of pattern recognition can be traced back to Plato (Πλατων,428 BC - 348 BC) and his student Aristotle (384 BC 322 BC), who distinguished between: An essential property shared by all members in a class or natural kind. An accidental property which would differ among members in the class.

Classification/categorization (or the functional description) 9/28

Types of decision / prediction problems 10/28 Classification Assigns the observation to a class from a small (discrete) set of possible classes. The output is a label, an identifier of the the class, e.g. the system grades apples as A, B, C, and a reject. Regression predicts a value from the observation. It is a generalization of the classification. The output could be, e.g. a real number as a company value based on its past performance and stock market indicators. Unsupervised learning (clustering) organizes observations into meaningful classes based on their mutual similarities. E.g. in transcriptomics, it builds groups of genes with related expression patterns (called coexpressed genes). Structural relations representation the objects is described using basic primitives, e.g. observation of a human by a surveillance camera as composition of prototypical actions, body positions. A structure comes into play.

Other disciplines sharing similar core ideas 11/28 Statistical modelling finds a (generative) model describing the studied object, e.g. using probability distributions and assesses its quality using statistical techniques. Machine learning given a set of training examples, learn the decision rules automatically. No manual (subjective) definition of rules is involved. A different task requires a different set of training examples. Data Mining extraction of implicit, previously unknown and potentially useful knowledge from the data. Scientific visualization A high-dimensional problem should be visualized as a 2D image or a 3D scene. We humans do not see more dimensions. Neural networks one of mathematical formalisms aimed at solving a decision problem without necessarily creating a model of a real biological system.

Biological motivation 12/28 A human is consider the most advanced animal also due to the ability to think about the way she/he reasons. There is a general interest in mimicking biological perception in machines. One of the aims is to imitate intelligent behavior in partly unknown environment. The ability to learn using stimuli from surrounding world is a basic attribute of intelligent behavior. Pattern recognition provides certain insight how learning can be performed. There is a key question knowledge representation. Among us humans, the observable means for sharing knowledge the natural language is the most advanced tool for expressing observations, description of phenomena, problem formulations, their solution and related learning issues.

Complex phenomena and system approach 13/28 A desire to understand complex phenomena, e.g., in biology, social sciences, technology requires to analyze involved phenomena in a complex way taking into account very many relations and different contexts. The system approach contrasts the Newtonian endeavor to reduce every phenomenon to relations among basic elements and their basic properties.

A few concepts from the system theory 14/28 While analyzing a complex phenomenon, we restrict ourselves to the part which is of our interest. We call it the object (or sometimes the system). The rest (which is unimportant from the chosen point of view) is called background. Objects are not often analyzed in their entire complexity. Instead, only those properties are observed or measured in one study which seem to be of interest. The system theory uses term resolution for different points of view. The object description (often mathematical) varies both quantitatively and qualitatively when the resolution is changed. The change of resolution provides a meta-view allowing to find a qualitative change in object description.

Generative discriminative object representation The attempt to exact description of objects (complex phenomena) using mathematical tools leads (roughly speaking) to two possible approaches: 1. Generative modeling. Attempts to understand physical / other principles and express them by models. This model is able to generate data similar to those observed empirically. The example be the mathematical modeling of a physical / technological phenomenon (in the Newtonian sense). 2. Discriminative classification. Attempts to understand the outer behaviour without knowing detailed principles (what is unknown for complex objects / phenomena). The output are decisions / prediction in the regression sense. The example is recognition (classification), e.g. determining the diagnosis of the disease by a physician / computer program. 15/28

Mathematical modeling 16/28 The important properties of the objects are mimicked using mathematical equations. The relation between the input and the output is often sought. The approach is often close to the Newtonian approach as the desire is to obtain a detailed and preferably a deterministic explanation. Example: A feasible mathematical model of a power house boiler used in control engineering predicts almost identical behavior as the real boiler. Counterexample 1: In many cases, we are not able to create a mathematical model of a complex system, e.g., the model describing how a human body is functioning. Counterexample 2: Computer vision. The inverse task to the physical process of the image formation is too complex and thus it is not useful in practice.

Pattern recognition as an alternative to modeling 17/28 Pattern recognition assigns observations according to some decision rule to a priori known classes of objects. Equivalence classes (reflexivity, symmetry, transitivity). Objects within classes are more similar to each other than objects from different classes. The understanding to the object is often weaker in pattern recognition than in modeling.

The role of learning in pattern recognition 18/28 The advantage of PR is that a human creating the recognition rule does not need to understand the complex nature of the object. A decision rule can be learned empirically from many observed examples. Knowledge engineering paradox: It is easier for humans to give examples of correct classification than to express an explicit classification rule. Three main approaches to learning: Supervised learning based on the training set comprising of observations and corresponding decisions assigned by a teacher (an expert). Unsupervised learning seeks for similarities among observations without having an expert classification at hand. Reinforcement learning explores reward information (positive, negative) from the environment. A cumulative reward is maximized.

Pattern recognition and applications 19/28 Pattern recognition theory and tools can be separated from applications. Object Getting formal description Object representation Classification Class label

Main approaches to pattern recognition 20/28 1. Statistical (feature-based) pattern recognition. Statistical model of patterns and pattern classes is assumed. The coordinate axes correspond to individual observations (features, measurements) expressed by a numerical values. Objects are represented as points in a vector space. 2. Structural pattern recognition. There is a structure among observations. The aim is to represent and explore it. Formal grammars are the oldest and the most advanced tool to represent the structure. 3. Artificial neural networks. The classifier is represented as a network of cells modeling neurons of the human brain (connectionist approach, e.g., a feedforward model of the neural network (McCulloch, Pitts, 1943).

Bayesian decision making 21/28 Bayesian task of statistical decision making seeks for sets X (observations), Y (hidden states) and D (decisions), a joint probability p XY : X Y R and the penalty function W : Y D R a strategy q: X D which minimizes the Bayesian risk R(q) = x X y Y p XY (x, y) W (y, q(x)). The solution to the Bayesian task is the Bayesian strategy q minimizing the risk. Notes: deterministic strategy, separation into convex subsets. Classification is a special case of the decision-making problem where the set of decisions D and hidden states Y coincide.

Generality of the Bayesian formulation (1) 22/28 Motto: Let set X (observations) and set Y (hidden states) be two finite sets. Statistical pattern recognition results are very general. Properties of sets X (observations) and Y (hidden parameters) were not constrained. Sets X and Y can have formally a (mathematically) diverse structure. The approach can be and is used in very different applications.

Generality of the Bayesian formulation (2) 23/28 Observation x can be a number, symbol, function of two variables (e.g., an image), graph, algebraic structure, etc. Application Observation Decisions value of a coin in a slot machine x R n value optical character recognition 2D bitmap, gray-level image characters, words license plate recognition 2D bitmap, gray-level image characters, numbers fingerprint recognition 2D bitmap, gray-level image personal identity speech recognition signal from a microphone x(t) words EEG, ECG analysis x(t) diagnosis forfeit detection various {yes, no} speaker identification signal from a microphone x(t) personal identity speaker verification signal from a microphone x(t) {yes, no}

Generative discriminative classifier Cf. a more general distinction between generative and discriminative models, slide 15 of this lecture. We wish to learn either a decision strategy q: X Y or the posterior probability P (Y X). Generative classifiers, e.g. naïve Bayes classifier, model-based as Gaussian mixture model,... Assume P (X Y ), P (Y ) be functions. Estimate P (X Y ), P (y) from training data directly. Use Bayes rule to calculate P (Y X = xi) Generative means that a model produces data subject to the probability distribution via sampling. Discriminative classifiers, e.g. perceptron, SVM, k-nn,... Assume posterior P (Y X) be a function. Estimate P (Y X) from training data. Discriminative means that the model enables classification of x and cannot generate x complying the probability model. 24/28

What has been known in statistical pattern recognition? Bayesian formulation based on a known statistical model. 25/28 Solution to some special non-bayesian tasks, e.g., with the class I do not know. (called also reject option), minimax classifier, tasks with non-random interventions. Linear classifiers and their learning. E.g., a popular special case Support Vector Machines. Embedding of a non-linear problem to a higher dimensional vector space, mainly locally acting kernel methods. Estimate of needed length of the training set for prescribed precision and reliability of classification (e.g., Vapnik-Chervonenkis theory of learning). Unsupervised learning, variants of EM algorithm. V. Franc, V. Hlaváč: Statistical Pattern Recognition Toolbox in MATLAB, in development since 2000.

Application of mathematical statistics 26/28 The most developed part of statistics is the statistics of random numbers. Recommendations are based on concepts as: mathematical expectation, dispersion, correlation, covariance matrix,... Tools of mathematical statistics can be used to solve many practical problems provided the random object can be represented by a number (or a vector of numbers). Substantial success in statistical pattern recognition for vectors of features. Failure for images. See the next slide.

Image analysis & objects Failure for images f(x, y), where f is brightness or color of a pixel and x, y are pixel coordinates. Inverting image formation process leads to an ill-posed task and thus useless practically. We need to anchor to the concept objects and explore its semantics.. The object detection, its segmentation in images is a chicken and egg problem. The link between semantics and the object appearance is needed. 27/28 Knowledge Observations + Context + Experience A problematic symbol grounding issue. Concept in our mind, its label symbol Context Perception Learning / Reasoning Percept sensory information Object the thing itself

Recommended reading 28/28 Duda Richard O., Hart Peter E., Stork, David G.:, Pattern Classification, John Wiley & Sons, New York, USA, 2001, 654 p. Schlesinger M.I., Hlaváč V.: Ten lectures on statistical and syntactic pattern recognition, Kluwer Academic Publishers, Dordrecht, The Netherlands, 2002, 521 p. Bishop C.: Pattern Recognition and Machine Learning, Springer-Verlag New York 2006, 758 p.