Statistical Pattern Recognition

Similar documents
Lecture 1: Machine Learning Basics

Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

CSL465/603 - Machine Learning

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Probabilistic Latent Semantic Analysis

(Sub)Gradient Descent

Lecture 1: Basic Concepts of Machine Learning

Word Segmentation of Off-line Handwritten Documents

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Semi-Supervised Face Detection

Using Web Searches on Important Words to Create Background Sets for LSI Classification

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Speech Recognition at ICSI: Broadcast News and beyond

Mathematics. Mathematics

Statewide Framework Document for:

arxiv: v2 [cs.cv] 30 Mar 2017

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

A Case Study: News Classification Based on Term Frequency

School of Innovative Technologies and Engineering

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Generative models and adversarial training

WHEN THERE IS A mismatch between the acoustic

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Artificial Neural Networks written examination

Learning Methods in Multilingual Speech Recognition

AQUA: An Ontology-Driven Question Answering System

An Online Handwriting Recognition System For Turkish

Issues in the Mining of Heart Failure Datasets

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Assignment 1: Predicting Amazon Review Ratings

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Learning Methods for Fuzzy Systems

A survey of multi-view machine learning

The taming of the data:

On-Line Data Analytics

Comparison of network inference packages and methods for multiple networks inference

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Australian Journal of Basic and Applied Sciences

Probability and Game Theory Course Syllabus

STA 225: Introductory Statistics (CT)

Active Learning. Yingyu Liang Computer Sciences 760 Fall

CS Machine Learning

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Modeling function word errors in DNN-HMM based LVCSR systems

Speech Emotion Recognition Using Support Vector Machine

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Human Emotion Recognition From Speech

Modeling function word errors in DNN-HMM based LVCSR systems

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Knowledge Transfer in Deep Convolutional Neural Nets

A Comparison of Two Text Representations for Sentiment Analysis

ME 4495 Computational Heat Transfer and Fluid Flow M,W 4:00 5:15 (Eng 177)

Self-Supervised Acquisition of Vowels in American English

Speech Recognition by Indexing and Sequencing

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Learning From the Past with Experiment Databases

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

A study of speaker adaptation for DNN-based speech synthesis

TOPICS LEARNING OUTCOMES ACTIVITES ASSESSMENT Numbers and the number system

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Speaker Identification by Comparison of Smart Methods. Abstract

Learning to Rank with Selection Bias in Personal Search

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes

Lecture 10: Reinforcement Learning

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Axiom 2013 Team Description Paper

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Abstractions and the Brain

Grade 6: Correlated to AGS Basic Math Skills

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Welcome to. ECML/PKDD 2004 Community meeting

Linking Task: Identifying authors and book titles in verbose queries

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Rule Learning With Negation: Issues Regarding Effectiveness

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Probability and Statistics Curriculum Pacing Guide

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Learning Disability Functional Capacity Evaluation. Dear Doctor,

DIDACTIC MODEL BRIDGING A CONCEPT WITH PHENOMENA

OFFICE SUPPORT SPECIALIST Technical Diploma

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Transcription:

Statistical Pattern Recognition A Brief Overview of the course Hamid R. Rabiee Jafar Muhammadi, Nima Pourdamghani Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/

Agenda What is a Pattern? What is Pattern Recognition (PR)? Applications of PR Components of a PR system Features Types of Learning The Design Cycle Pattern Recognition Approaches Brief Mathematical Overview Course Road Map 2

What is a pattern? Pattern Opposite to chaos; it is an entity, object, process or event, vaguely defined, that can be given a name or label. For example, a pattern could be A fingerprint image A handwritten cursive word A human face A speech signal Texture Etc. 3

What is Pattern Recognition? Pattern recognition (PR) The study of how machines can observe the environment, learn to distinguish patterns of interest from their background, and make sound and reasonable decisions about the categories of the patterns. The assignment of a physical object or event to one of several pre-specified categories Related terms A pattern class (or category) is a set of patterns sharing common attributes and usually originating from the same source. During recognition (or classification) given objects are assigned to prescribed classes (get labeled). A classifier is a machine which performs classification. 4

An Example Four pattern categories (classes) Sea, Beach, Jungle, Sky Common attributes (features) Color Contrast Texture Goal Observing some labeled pixels, We wish to assign a label to each new (unlabeled) pixel. 5

Applications of PR Handwritten Digit Recognition Input Pattern: pictures of handwritten digits Output Classes: the digit (0.. 9) Skin Detection Input pattern: a picture Output Classes: Skin / not skin for each pixel Speech Recognition Input Pattern: Speech waveform Output Classes: Specified Spoken words 6

Applications of PR Document Classification (Web news classification) Input Pattern: Text or html document Output Classes: Semantic Categories (e.g. business, sports, ) Financial Time Series Prediction Input Pattern: relation between consecutive data of time series Output Values: possible values of output (regression problem) Sequence Analysis (Bioinformatics) Input Pattern: DNA / protein sequences Output: Known types of genes Spam Detection Input Pattern: Text / image of emails Output Classes: Spam / not spam 7

Components of a PR system Pattern Space Feature Space Classification Space Real World Sensors and preprocessing Feature extraction Classifier Class assignment Training Learning algorithm Components Sensors and preprocessing feature extraction Classifier Training: Provides some useful information for supervised learning Learning algorithm: Create classifier from training data (labeled samples) 8

Components of a PR system Example: Separate different types of fishes Sensor: Camera Preprocessing: Segmentation Features: Ask experts the major differences between types See different fishes and find the differences Length, Width, Number of fins, Learning: Ask experts the type of sample fishes Find typical length of each type Classification: Compare the length (width, etc) of a new fish to the learned lengths 9

Features Feature is any distinctive aspect, quality or characteristic Features may be symbolic (i.e., color) or numeric (i.e, height) Definitions The combination of d feature is represented as a d-dimensional column vector called a feature vector. The d-dimensional space defined by the feature vector is called the feature space. Objects are represented as points in feature space. This representation is called a scatter plot. x x x 1 2 d Feature vector Feature space Scatter plot 10

Features Fish Separation Example: The length is a poor feature alone! Select the lightness as a possible feature 11

Features Fish Separation Example: The scatter plot for following two features: Fish Lightness Width 12

Good/Bad Features & Classification The quality of a feature vector is related to its ability to discriminate examples from different classes Examples from the same class should have similar feature values Examples from the different classes have different feature values The distinction between good and poor features (a), and feature properties (b) Good features (a) Bad features Linear Separablility Non-linear Separability Multi modal Highly correlated (b) 13

Feature Dimension The curse of dimensionality The probability of misclassification of a decision rule does not decrease beyond a certain dimension for the feature space as the number of features increases. Peaking Phenomena Adding features may actually degrade the performance of a classifier 14

Overfitting and underfitting The inductive bias of a learning algorithm is the set of assumptions that the learner uses to predict outputs given inputs that it has not encountered (Mitchell, 1980) Too high a bias, though simpler, may lead to underfitting and too low a bias, on the other hand, evidently more complex, may lead to overfitting. underfitting good fit overfitting 15

Overfitting and underfitting Fish Separation Example: overfitting underfitting good fit 16

Types of Learning Unsupervised learning The system forms clusters or natural groupings of the input patterns Supervised learning Unsupervised learning Classify data using labeled samples (labels are provided by a trainer) Semi-supervised learning make use of both labeled and unlabeled data Supervised learning for training - typically a small amount of labeled data with a large amount of unlabeled data. Semi-Supervised learning 17

Types of Learning (Algorithmic view point) Types of Learning Inductive: Learns a labeling function over the space Transductive: Just labels the given test queries From Wikipedia The goal is to predict appropriate labels for all of the unlabeled points shown as (?). The Inductive learning algorithm will only have five labeled points to use as a basis for building a predictive model. For example, if a nearestneighbor algorithm is used, then the points near the middle will be labeled "A" or "C instead of B. Transduction has the advantage of being able to consider all of the points, not just the labeled points, while performing the labeling task. In this case, transductive algorithms would label the unlabeled points according to the clusters to which they naturally belong. The points in the middle, therefore, would most likely be labeled "B", One disadvantage of transduction is that it builds no predictive model. If a previously unknown point is added to the set, the entire transductive algorithm would need to be repeated with all of the points in order to predict a label. 18

The Design Cycle Design cycle Data collection Feature Choice Model Choice Training Evaluation Computational Complexity Collect Data Choose features Choose model Train classifier Evaluate classifier 19

More on Design Cycle Data Collection How much data is sufficient and representative? Feature Choice Domain Specific Good features: Simple to extract, invariant to irrelevant transformation, insensitive to noise. Model Choice Which model to choose for better performance? Training Which of the many different procedures? Evaluation Measure the error rate Computational Complexity What is the trade-off between computational ease and performance? 20

Pattern Recognition Approaches Statistical PR Based on underlying statistical properties/model of patterns and pattern classes. Use numerical features for distinguishing between classes. Bayesian Methods Neural Networks Decision Trees Support Vector Machines Etc. Structural (or syntactic) PR Based on explicit or implicit representation of a class s structure Pattern classes represented by means of formal structures as grammers, automata, strings, graphs, trees, etc. Reference: Syntactic and structural pattern recognition: theory and applications, By Horst Bunke, Alberto Sanfeliu 21

Pattern Recognition Approaches Example: Example: neural, statistical and structural OCR 22

Background Mathematical Review In the TA class (attendance mandatory) you ll review the following mathematical concepts: Distribution functions and measures Distribution functions Moments, Covariance matrix Feature spaces (correlation, orthogonality, independency, etc.) Gaussian Distribution Gaussian distribution function Central Limit Theorem Linear Algebra Matrices (rank, determinant, inversion, differential, derivation, etc) Eigen values and eigen vectors Information theory Entropy, Information gain, etc. Distances Axioms of distance measure Distance measures: Euclidean, Mahalanobis, Minkowsky, etc. Distributions distance measure: Kullback-Leibler 23

Road Map How to choose / extract features Dimensionality reduction Classification Probabilistic methods Linear discriminant methods Non parametric methods Neural Networks Support Vector Machines Kernel methods Graphical methods Clustering Partitioning methods Density based methods Expectation Maximization Semi-supervised learning Applications 24

Any Question End of Lecture 1 Thank you! Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ 25