L1: Course introduction

Similar documents
Python Machine Learning

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Word Segmentation of Off-line Handwritten Documents

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Lecture 1: Basic Concepts of Machine Learning

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

A Neural Network GUI Tested on Text-To-Phoneme Mapping

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Axiom 2013 Team Description Paper

Probabilistic Latent Semantic Analysis

INPE São José dos Campos

Calibration of Confidence Measures in Speech Recognition

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Learning Methods for Fuzzy Systems

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Lecture 1: Machine Learning Basics

CS Machine Learning

Human Emotion Recognition From Speech

Speech Recognition at ICSI: Broadcast News and beyond

Cal s Dinner Card Deals

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

A Case Study: News Classification Based on Term Frequency

Test Effort Estimation Using Neural Network

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Laboratorio di Intelligenza Artificiale e Robotica

Speaker Identification by Comparison of Smart Methods. Abstract

A study of speaker adaptation for DNN-based speech synthesis

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Mathematics Success Level E

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Seminar - Organic Computing

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Artificial Neural Networks written examination

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Assignment 1: Predicting Amazon Review Ratings

Evolutive Neural Net Fuzzy Filtering: Basic Description

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Rule Learning With Negation: Issues Regarding Effectiveness

Knowledge Transfer in Deep Convolutional Neural Nets

Visit us at:

preassessment was administered)

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

OFFICE SUPPORT SPECIALIST Technical Diploma

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

CSL465/603 - Machine Learning

Artificial Neural Networks

Generative models and adversarial training

Mining Association Rules in Student s Assessment Data

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4

Using dialogue context to improve parsing performance in dialogue systems

TOPICS LEARNING OUTCOMES ACTIVITES ASSESSMENT Numbers and the number system

Firms and Markets Saturdays Summer I 2014

Syllabus ENGR 190 Introductory Calculus (QR)

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Statewide Framework Document for:

On the Formation of Phoneme Categories in DNN Acoustic Models

Speech Emotion Recognition Using Support Vector Machine

Probability and Statistics Curriculum Pacing Guide

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

Australian Journal of Basic and Applied Sciences

Applications of data mining algorithms to analysis of medical data

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

16.1 Lesson: Putting it into practice - isikhnas

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

AP Statistics Summer Assignment 17-18

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

An Online Handwriting Recognition System For Turkish

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Laboratorio di Intelligenza Artificiale e Robotica

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Evolution of Symbolisation in Chimpanzees and Neural Nets

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Robot manipulations and development of spatial imagery

A Reinforcement Learning Variant for Control Scheduling

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

TUESDAYS/THURSDAYS, NOV. 11, 2014-FEB. 12, 2015 x COURSE NUMBER 6520 (1)

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

AP Calculus AB. Nevada Academic Standards that are assessable at the local level only.

Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

LOUISIANA HIGH SCHOOL RALLY ASSOCIATION

Modeling function word errors in DNN-HMM based LVCSR systems

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Compositional Semantics

Linking Task: Identifying authors and book titles in verbose queries

Accelerated Learning Course Outline

Time series prediction

Using the Artificial Neural Networks for Identification Unknown Person

School of Innovative Technologies and Engineering

Transcription:

Introduction Course organization Grading policy Outline What is pattern recognition? Definitions from the literature Related fields and applications L1: Course introduction Components of a pattern recognition system Pattern recognition problems Features and patterns The pattern recognition design cycle Pattern recognition approaches Statistical Neural Structural CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 1

Course organization (1) Instructor Ricardo Gutierrez-Osuna Office: 506A HRRB Tel: (979) 845-2942 E-mail: rgutier@cse.tamu.edu URL: http://faculty.cse.tamu.edu/rgutier Grading Homework Tests 3 assignments, every 3 weeks 1 midterm, 1 final (comprehensive) Term project Open-ended Public presentation Weight (%) Homework 40 Project 30 Midterm 15 Final Exam 15 CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 2

Homework assignments Course organization (2) Start early, ideally the same day they are assigned Do the assignments individually code sharing is not allowed Unless otherwise stated, you are to develop your own code When in doubt about open-source or built-in libraries, ask! To get an A in the homework, you must go beyond the assignment Budget about 20 hours for each homework Course project Start early; do not wait until the day before proposals are due Discuss your ideas with me early on The ideal project has enough substance to be publishable in a reputable engineering conference The ideal team consists of 3-4 people Budget about 40 hours (per person) for the course project You must be able to write in clear professional English CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 3

Prerequisites Course organization (3) Statistics, linear algebra, calculus (undergraduate level) Experience with a programming language (C/C++, Java, Python) Classroom etiquette Arrive to the classroom on time to avoid disrupting others No laptops, tablets or smartphones; lecture notes are available online Other This is NOT an easy class you will have to work hard No extra assignments to make up for poor grades CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 4

What is pattern recognition? Definitions from the literature The assignment of a physical object or event to one of several prespecified categories Duda and Hart A problem of estimating density functions in a high-dimensional space and dividing the space into the regions of categories or classes Fukunaga Given some examples of complex signals and the correct decisions for them, make decisions automatically for a stream of future examples Ripley The science that concerns the description or classification (recognition) of measurements Schalkoff The process of giving names to observations x, Schürmann Pattern Recognition is concerned with answering the question What is this? Morse CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 5

Examples of pattern recognition problems Machine vision Visual inspection, ATR Imaging device detects ground target Classification into friend or foe Character recognition Automated mail sorting, processing bank checks Scanner captures an image of the text Image is converted into constituent characters Computer aided diagnosis Medical imaging, EEG, ECG signal analysis Designed to assist (not replace) physicians Example: X-ray mammography 10-30% false negatives in x-ray mammograms 2/3 of these could be prevented with proper analysis Speech recognition Human Computer Interaction, Universal Access Microphone records acoustic signal Speech signal is classified into phonemes and/or words 0.2 0.1 0-0.1-0.2-0.3 1 2 3 4 5 x 10 4 samples CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 6

Related fields and application areas for PR Related fields Applications Adaptive signal processing Machine learning Artificial neural networks Robotics and vision Cognitive sciences Mathematical statistics Nonlinear optimization Exploratory data analysis Fuzzy and genetic systems Detection and estimation theory Formal languages Structural modeling Biological cybernetics Computational neuroscience Image processing Computer vision Speech recognition Multimodal interfaces Automated target recognition Optical character recognition Seismic analysis Man and machine diagnostics Fingerprint identification Industrial inspection Financial forecast Medical diagnosis ECG signal analysis CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 7

Components of a pattern recognition system A basic pattern classification system contains A sensor A preprocessing mechanism A feature extraction mechanism (manual or automated) A classification algorithm A set of examples (training set) already classified or described Measuring devices Preprocessing Dimensionality reduction Prediction Model selection The real world u v v ΔR R 0 f 2 f 1 Analysis results Sensors Cameras Databases Noise filtering Feature extraction Normalization Feature selection Feature projection Classification Regression Clustering Description Cross-validation Bootstrap CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 8

Types of prediction problems Classification The PR problem of assigning an object to a class The output of the PR system is an integer label e.g. classifying a product as good or bad in a quality control test Regression A generalization of a classification task The output of the PR system is a real-valued number e.g. predicting the share value of a firm based on past performance and stock market indicators Clustering The problem of organizing objects into meaningful groups The system returns a (sometimes hierarchical) grouping of objects e.g. organizing life forms into a taxonomy of species Description The problem of representing an object in terms of a series of primitives The PR system produces a structural or linguistic description e.g. labeling an ECG signal in terms of P, QRS and T complexes CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 9

Feature 2 Feature Features and patterns Feature is any distinctive aspect, quality or characteristic Features may be symbolic (i.e., color) or numeric (i.e., height) Definitions The combination of d features is a d-dim column vector called a feature vector The d-dimensional space defined by the feature vector is called the feature space Objects are represented as points in feature space; the result is a scatter plot Feature vector Feature space (3D) Scatter plot (2D) x 3 Class 1 x = x 1 x 2 x Class 3 x d Pattern x 1 x 2 Class 2 Pattern is a composite of traits or features characteristic of an individual In classification tasks, a pattern is a pair of variables {x, ω} where x is a collection of observations or features (feature vector) ω is the concept behind the observation (label) Feature 1 CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 10

What makes a good feature vector? The quality of a feature vector is related to its ability to discriminate examples from different classes Examples from the same class should have similar feature values Examples from different classes have different feature values Good features More feature properties Bad features Linear separability Non-linear separability Highly correlated features Multi-modal CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 11

Classifiers The task of a classifier is to partition feature space into class-labeled decision regions Borders between decision regions are called decision boundaries The classification of feature vector x consists of determining which decision region it belongs to, and assign x to this class A classifier can be represented as a set of discriminant functions The classifier assigns a feature vector x to class ω i if g i x > g j x j i Class assignment Select max R1 R3 R1 R2 R4 Costs R2 R3 Discriminant functions g 1 (x) g 2 (x) g C (x) Features x 1 x 2 x 3 x d CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 12

Pattern recognition approaches Statistical Patterns classified based on an underlying statistical model of the features The statistical model is defined by a family of class-conditional probability density functions p x ω i (Probability of feature vector x given class ω i ) Neural Classification is based on the response of a network of processing units (neurons) to an input stimuli (pattern) Knowledge is stored in the connectivity and strength of the synaptic weights Trainable, non-algorithmic, black-box strategy Very attractive since it requires minimum a priori knowledge with enough layers and neurons, ANNs can create any complex decision region Syntactic Patterns classified based on measures of structural similarity Knowledge is represented by means of formal grammars or relational descriptions (graphs) Used not only for classification, but also for description Typically, syntactic approaches formulate hierarchical descriptions of complex patterns built up from simpler sub patterns CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 13

Example: neural, statistical and structural OCR A [Schalkoff, 1992] Neural* Statistical Structural Feature extraction: # intersections # right oblique lines # left oblique lines # horizontal lines # holes + + x 1 0 1 1 0 0 1 0 1 0 *Neural approaches may also employ feature extraction Probabilis model tic x2 2 T 3 2 1 2 1 p(x " A" ) P(f 1, f 2 i ) Feature #2 Feature #1 x 3 + To parser CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 14

A simple pattern recognition problem Consider the problem of recognizing the letters L,P,O,E,Q Determine a sufficient set of features Design a tree-structured classifier Start Character Vertical straight lines Horizontal straight lines Features Oblique straight lines Curved lines L 1 1 0 0 P 1 0 0 1 O 0 0 0 1 E 1 3 0 0 Q 0 0 1 1 YES P C>0? YES YES H>0? V>0? NO NO Q YES O>0? NO O E L CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 15

The pattern recognition design cycle Data collection Probably the most time-intensive component of a PR project How many examples are enough? Feature choice Critical to the success of the PR problem Garbage in, garbage out Requires basic prior knowledge Model choice Statistical, neural and structural approaches Parameter settings Training Given a feature set and a blank model, adapt the model to explain the data Supervised, unsupervised and reinforcement learning Evaluation How well does the trained model do? Overfitting vs. generalization CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 16

Consider the following scenario A fish processing plan wants to automate the process of sorting incoming fish according to species (salmon or sea bass) The automation system consists of a conveyor belt for incoming products two conveyor belts for sorted products a pick-and-place robotic arm a vision system with an overhead CCD camera a computer to analyze images and control the robot arm CCD camera Conveyor belt (salmon) Conveyor belt computer [Duda, Hart and Stork, 2001] Robot arm Conveyor belt (bass) CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 17

Sensor The vision system captures an image as a new fish enters the sorting area Preprocessing Image processing algorithms, e.g., adjustments for average intensity levels, segmentation to separate fish from background Feature extraction Suppose we know that, on the average, sea bass is larger than salmon From the segmented image we estimate the length of the fish Classification Collect a set of examples from both species Compute the distribution of lengths for both classes Determine a decision boundary (threshold) that minimizes the classification error We estimate the classifier s probability of error and obtain a discouraging result of 40% What do we do now? count Salmon Decision boundary Sea bass length CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 18

length Improving the performance of our PR system Determined to achieve a recognition rate of 95%, we try a number of features Width, area, position of the eyes w.r.t. mouth... only to find out that these features contain no discriminatory information Finally we find a good feature: average intensity of the scales count Decision boundary Sea bass Salmon Avg. scale intensity We combine length and average intensity of the scales to improve class separability We compute a linear discriminant function to separate the two classes, and obtain a classification rate of 95.7% Sea bass Salmon Decision boundary Avg. scale intensity CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 19

length length Cost vs. classification rate Our linear classifier was designed to minimize the overall misclassification rate Is this the best objective function for our fish processing plant? The cost of misclassifying salmon as sea bass is that the end customer will occasionally find a tasty piece of salmon when he purchases sea bass The cost of misclassifying sea bass as salmon is an end customer upset when he finds a piece of sea bass purchased at the price of salmon Intuitively, we could adjust the decision boundary to minimize this cost function Decision boundary New Decision boundary Sea bass Salmon Sea bass Salmon Avg. scale intensity Avg. scale intensity CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 20

length The issue of generalization The recognition rate of our linear classifier (95.7%) met the design specs, but we still think we can improve the performance of the system We then design an ANN with five hidden layers, a combination of logistic and hyperbolic tangent activation functions, train it with the Levenberg-Marquardt algorithm and obtain an impressive classification rate of 99.9975% with the following decision boundary Sea bass Salmon Satisfied with our classifier, we integrate the system and deploy it to the fish processing plant After a few days, the plant manager calls to complain that the system is misclassifying an average of 25% of the fish What went wrong? Avg. scale intensity CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU 21