Similar documents
Lecture 1: Machine Learning Basics

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

Generative models and adversarial training

Python Machine Learning

Lecture 1: Basic Concepts of Machine Learning

CSL465/603 - Machine Learning

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

INPE São José dos Campos

A Case Study: News Classification Based on Term Frequency

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

CS Machine Learning

Abstractions and the Brain

Artificial Neural Networks written examination

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Probabilistic Latent Semantic Analysis

Word Segmentation of Off-line Handwritten Documents

Learning Methods for Fuzzy Systems

Assignment 1: Predicting Amazon Review Ratings

Knowledge Transfer in Deep Convolutional Neural Nets

Semi-Supervised Face Detection

Switchboard Language Model Improvement with Conversational Data from Gigaword

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Speech Recognition at ICSI: Broadcast News and beyond

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Softprop: Softmax Neural Network Backpropagation Learning

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Evolutive Neural Net Fuzzy Filtering: Basic Description

A Version Space Approach to Learning Context-free Grammars

Human Emotion Recognition From Speech

Seminar - Organic Computing

Welcome to. ECML/PKDD 2004 Community meeting

Time series prediction

A study of speaker adaptation for DNN-based speech synthesis

Speech Emotion Recognition Using Support Vector Machine

Exploration. CS : Deep Reinforcement Learning Sergey Levine

A survey of multi-view machine learning

WHEN THERE IS A mismatch between the acoustic

A General Class of Noncontext Free Grammars Generating Context Free Languages

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Lecture 10: Reinforcement Learning

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

AQUA: An Ontology-Driven Question Answering System

Probability and Statistics Curriculum Pacing Guide

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Rule Learning With Negation: Issues Regarding Effectiveness

Word learning as Bayesian inference

A student diagnosing and evaluation system for laboratory-based academic exercises

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

MYCIN. The MYCIN Task

Discriminative Learning of Beam-Search Heuristics for Planning

Reducing Features to Improve Bug Prediction

Probability and Game Theory Course Syllabus

Laboratorio di Intelligenza Artificiale e Robotica

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Proof Theory for Syntacticians

Linking Task: Identifying authors and book titles in verbose queries

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

LOUISIANA HIGH SCHOOL RALLY ASSOCIATION

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Finding Your Friends and Following Them to Where You Are

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Using dialogue context to improve parsing performance in dialogue systems

Software Maintenance

An OO Framework for building Intelligence and Learning properties in Software Agents

On-Line Data Analytics

Laboratorio di Intelligenza Artificiale e Robotica

Calibration of Confidence Measures in Speech Recognition

STA 225: Introductory Statistics (CT)

Diagnostic Test. Middle School Mathematics

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Learning Methods in Multilingual Speech Recognition

The taming of the data:

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

The Strong Minimalist Thesis and Bounded Optimality

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Mining Student Evolution Using Associative Classification and Clustering

Mining Association Rules in Student s Assessment Data

Applications of data mining algorithms to analysis of medical data

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Computerized Adaptive Psychological Testing A Personalisation Perspective

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Transcription:

EE-002 Computational Learning & Pattern Recognition Where or how to find me? Turgay IBRIKCI Çukurova University Electrical-Electronics Engineering Department Associate Prof. Dr. Turgay IBRIKCI Room # 305 Thursdays 9:30-12:00 (322) 338 6868 / 139 turgayibrikci@hotmail.com 2 Course Outline Course Grading The course is divided in two parts: theory and practice. 1. Theory covers basic topics in pattern recognition theory and applications with computational learning. 2. Practice deals with basics of MATLAB and implementation of pattern recognition algorithms. We assume that you know MATLAB or you will learn yourself 3 Grading the Class: Project 40% Report Presentation (Week 14 ; 20 mins) Final Exam 20% (Week 15-We decide together) Homeworks 40% (At least 4 homeworks) Full attending the class 10% (Required Bonus) 4 In This Course What is pattern recognition? How should objects to be classified be represented? What algorithms can be used for recognition (or matching)? How should learning (training) be done? Much of the topics concern statistical classification methods. They include generative methods such as those based on Bayes decision theory and related techniques of parameter estimation and density estimation. Apply the algorithms with MATLAB The assignment of a physical object or event to one of several prespecified categeries -- Duda & Hart A pattern is an object, process or event that can be given a name. A pattern class (or category) is a set of patterns sharing common attributes and usually originating from the same source. During recognition (or classification) given objects are assigned to prescribed classes. A classifier is a machine which performs classification. 1

Examples of applications What are Patterns? Optical Character Recognition (OCR) Biometrics Diagnostic systems Military applications Handwritten: sorting letters by postal code, input device for PDA s. Printed texts: reading machines for blind people, digitalization of text documents. Face recognition, verification, retrieval. Finger prints recognition. Speech recognition. Medical diagnosis: X-Ray, EKG analysis. Machine diagnostics, waster detection. Automated Target Recognition (ATR). Image segmentation and analysis (recognition from aerial or satelite photographs). Laws of Physics & Chemistry generate patterns. Patterns in Astronomy Humans tend to see patterns everywhere. Patterns in Biology Applications: Biometrics, Computational Anatomy, Brain Mapping. Patterns of Brain Activity Relations between brain activity, emotion, cognition, and behaviour. Variations of Patterns Patterns vary with expression, lighting, occlusions. 2

Speech Patterns Acoustic signals. examples examples examples examples Goal of Pattern Recognition Recognize Patterns. Make decisions about patterns. Visual Example is this person happy or sad? Speech Example did the speaker say Yes or No? Physics Example is this an atom or a molecule? 3

Approaches Basic concepts Statistical PR: based on underlying statistical model of patterns and pattern classes. Structural (or syntactic) PR: pattern classes represented by means of formal structures as grammars, automata, strings, etc. Neural networks: classifier is represented as a network of cells modeling neurons of the human brain (connectionist approach). Pattern y x1 Feature vector xx x2 x - x is a point in feature space X. x n - A vector of observations (measurements) Hidden state yy - Cannot be directly measured. - Patterns with equal hidden state belong to the same class. Task - To design a classifer (decision rule) q:x Y which decides about a hidden state based on an observation. Example height weight Linear classifier: x1 x x2 Task: jockey-hoopster recognition. The set of hidden state is The feature space is Y { H, J} 2 X Training examples {( x1, y1),,( x l, yl )} x 2 y H H if ( wx) b 0 q( x) J if ( wx) b 0 y J ( wx) b0 Example: Salmon versus Sea Bass Generative methods attempt to model the full appearance of Salmon and Sea Bass. Discriminative methods extract features sufficient to make the decision (e.g. length and brightness). x 1 Fish Features. Length. Salmon are usually shorter than Sea Bass. Fish Features. Lightness. Sea Bass are usually brighter than Salmon. 4

Components of PR system Feature extraction Pattern Sensors and preprocessing Feature extraction Classifier Class assignment Task: to extract features which are good for classification. Good features: Objects from the same class have similar feature values. Objects from different classes have different values. Teacher Learning algorithm Sensors and preprocessing. A feature extraction aims to create discriminative features good for classification A classifier. A teacher provides information about hidden state -- supervised learning. A learning algorithm sets PR from training examples. Good features Bad features Feature extraction methods Classifier m1 m2 m k Feature extraction φ 1 φ 2 φ n x1 x2 x n Feature selection m1 x1 m2 m x2 3 m k x n Problem can be expressed as optimization of parameters of featrure extractor φ(θ). Supervised methods: objective function is a criterion of separability (discriminability) of labeled examples, e.g., linear discriminat analysis (LDA). Unsupervised methods: lower dimesional representation which preserves important characteristics of input data is sought for, e.g., principal component analysis (PCA). A classifier partitions feature space X into class-labeled regions such that X X1 X2 X Y and X1 X2 X Y {0} X 1 X 2 X 3 The classification consists of determining to which region a feature vector x belongs to. Borders between decision boundaries are called decision regions. X 3 X 1 X 2 X 1 Representation of classifier An Introduction A classifier is typically represented as a set of discriminant functions fi( x): X, i 1,, Y The classifier assigns a feature vector x to the i-the class if f ( x) f j( x) x Feature vector f 1 ( x) f 2 ( x) f Y x ( ) Discriminant function max i j i y Class identifier Bayesian Decision Theory came long before Version Spaces, Decision Tree Learning and Neural Networks. It was studied in the field of Statistical Theory and more specifically, in the field of Pattern Recognition. Bayesian Decision Theory is at the basis of important learning schemes such as the Naïve Bayes Classifier, Learning Bayesian Belief Networks and the EM Algorithm. 30 5

Bayesian decision making Bayes Theorem The Bayesian decision making is a fundamental statistical approach which allows to design the optimal classifier if complete statistical model is known. Definition: Obsevations Hidden states Decisions X Y D A loss function A decision rule A joint probability Task: to design decision rule q which minimizes Bayesian risk R(q) yy x X p( x, y ) W(q( x), y) W : Y D q:x D p( x, y) R Goal: To determine the most probable hypothesis, given the data D plus any initial knowledge about the prior probabilities of the various hypotheses in H. Prior probability of h, P(h): it reflects any background knowledge we have about the chance that h is a correct hypothesis (before having observed the data). Prior probability of D, P(D): it reflects the probability that training data D will be observed given no knowledge about which hypothesis h holds. Conditional Probability of observation D, P(D h): it denotes the probability of observing data D given some world in which hypothesis h holds. 32 Bayes Theorem (Cont d) Bayesian Belief Networks Posterior probability of h, P(h D): it represents the probability that h holds given the observed training data D. It reflects our confidence that h holds after we have seen the training data D and it is the quantity that Machine Learning researchers are interested in. Bayes Theorem allows us to compute P(h D): The Bayes Optimal Classifier is often too costly to apply. The Naïve Bayes Classifier uses the conditional independence assumption to defray these costs. However, in many cases, such an assumption is overly restrictive. Bayesian belief networks provide an intermediate approach which allows stating conditional independence assumptions that apply to subsets of the variable. P(h D)=P(D h)p(h)/p(d) 33 34 Representation in Bayesian Belief Networks Storm Lightning Thunder BusTourGroup Campfire ForestFire Associated with each node is a conditional probability table, which specifies the conditional distribution for the variable given its immediate parents in the graph Each node is asserted to be conditionally independent of its non-descendants, given its immediate parents 35 Inference in Bayesian Belief Networks A Bayesian Network can be used to compute the probability distribution for any subset of network variables given the values or distributions for any subset of the remaining variables. Unfortunately, exact inference of probabilities in general for an arbitrary Bayesian Network is known to be NP-hard. In theory, approximate techniques (such as Monte Carlo Methods) can also be NP-hard, though in practice, many such methods were shown to be useful. 36 6

Example of Bayesian task Limitations of Bayesian approach Task: minimization of classification error. A set of decisions D is the same as set of hidden states Y. 0 if q( x) y 0/1 - loss function used W(q( x), y) 1 if q( x) y The Bayesian risk R(q) corresponds to probability of misclassification. The solution of Bayesian task is * * p( x y)p( y) q argminr(q) y argmax p( y x) argmax q y y p( x) The statistical model p(x,y) is mostly not known therefore learning must be employed to estimate p(x,y) from training examples {(x 1,y 1 ),,(x,y )} -- plug-in Bayes. Non-Bayesian methods offers further task formulations: A partial statistical model is avaliable only: p(y) is not known or does not exist. p(x y,) is influenced by a non-random intervetion. The loss function is not defined. Examples: Neyman-Pearson s task, Minimax task, etc. Discriminative approaches Learning Theory Given a class of classification rules q(x;θ) parametrized by θ the task is to find the best parameter θ * based on a set of training examples {(x 1,y 1 ),,(x,y )} -- supervised learning. The task of learning: recognition which classification rule is to be used. The way how to perform the learning is determined by a selected inductive principle. Both Generative and Discriminative methods require training data to learn the models/features/decision rules. Machine Learning concentrates on learning discrimination rules. Key Issue: do we have enough training data to learn? Empirical risk minimization principle Overfitting and underfitting The true expected risk R(q) is approximated by empirical risk 1 Remp (q( x; θ)) W(q( xi; θ), y i ) i 1 with respect to a given labeled training set {(x 1,y 1 ),,(x,y )}. The learning based on the empirical minimization principle is defined as * θ argmin Remp (q( x; θ)) θ Examples of algorithms: Perceptron, Back-propagation, etc. Problem: how rich class of classifications q(x;θ) to use. underfitting good fit overfitting Problem of generalization: a small emprical risk R emp does not imply small true expected risk R. 7

Structural risk minimization principle Machine Learning is Statistical learning theory -- Vapnik & Chervonenkis An upper bound on the expected risk of a classification rule qq R(q) R emp (q) R str 1 1 (, h,log ) where is number of training examples, h is VC-dimension of class of functions Q and 1- is confidence of the upper bound. Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data. SRM principle: from a given nested function classes Q 1,Q 2,,Q m, such that h1 h2 h m select a rule q * which minimizes the upper bound on the expected risk. Machine Learning is Machine learning is programming computers to optimize a performance criterion using example data or past experience. -- Ethem Alpaydin The goal of machine learning is to develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest. -- Kevin P. Murphy Machine Learning is Machine learning is about predicting the future based on the past. -- Hal Daume III The field of pattern recognition is concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these regularities to take actions. -- Christopher M. Bishop Machine Learning is Supervised learning examples Machine learning is about predicting the future based on the past. -- Hal Daume III past future label label 1 label 3 labeled examples Training model/ predictor Testing model/ predictor label 4 label 5 Supervised learning: given labeled examples 8

Supervised learning Supervised learning label label 1 label 3 model/ predictor model/ predictor predicted label label 4 label 5 Supervised learning: given labeled examples Supervised learning: learn to predict new example Supervised learning: classification Classification Example label apple apple banana banana Classification: a finite set of labels Differentiate between low-risk and high-risk customers from their income and savings Supervised learning: given labeled examples Supervised learning: regression label Regression Example Price of a used car -4.5 10.1 3.2 Regression: label is real-valued x : car attributes (e.g. mileage) y : price y = wx+w 0 4.3 Supervised learning: given labeled examples 54 9

Regression Applications Supervised learning: ranking Economics/Finance: predict the value of a stock label Epidemiology Car/plane navigation: angle of the steering wheel, acceleration, Temporal trends: weather over time 1 4 2 3 Ranking: label is a ranking Supervised learning: given labeled examples Ranking example Unsupervised learning Given a query and a set of web pages, rank them according to relevance Unupervised learning: given data, i.e. examples, but no labels Unsupervised learning applications learn clusters/groups without any label customer segmentation (i.e. grouping) image compression bioinformatics: learn motifs Unsupervised learning Input: training examples {x 1,,x } without information about the hidden state. Clustering: goal is to find clusters of data sharing similar properties. A broad class of unsupervised learning algorithms: { 1 x,, x } { y 1,, y } Classifier Classifier q: XΘ Y θ Learning algorithm Learning algorithm L: (X Y) Θ (supervised) 10

Example of unsupervised learning algorithm k-means clustering: { 1 x,, x } Classifier y q( x) argmin xm i1,, k Learning algorithm 1 i x j, Ii j m I { :q( xj) i} i ji i θ m,, m } { 1 k { 1 y,, y } i Goal is to minimize i1 m 1 x m i 2 q( x ) i m 2 m 3 References Books Theodoridis, Koutroumbas Pattern Recognition( 4 th Edition, 2004) Duda, Heart: Pattern Classification and Scene Analysis. J. Wiley & Sons, New York, 1982. (2nd edition 2000). Fukunaga: Introduction to Statistical Pattern Recognition. Academic Press, 1990. Bishop: Neural Networks for Pattern Recognition. Claredon Press, Oxford, 1997. Schlesinger, Hlaváč: Ten lectures on statistical and structural pattern recognition. Kluwer Academic Publisher, 2002. Slices : Vojtěch Franc Journals Journal of Pattern Recognition Society. IEEE transactions on Neural Networks. Pattern Recognition and Machine Learning. 11