Overview COEN 296 Topics in Computer Engineering Introduction to Pattern Recognition and Data Mining Course Goals Syllabus

Similar documents
Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Lecture 1: Machine Learning Basics

Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

CS Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Word Segmentation of Off-line Handwritten Documents

Probabilistic Latent Semantic Analysis

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CSL465/603 - Machine Learning

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Australian Journal of Basic and Applied Sciences

Knowledge Transfer in Deep Convolutional Neural Nets

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Human Emotion Recognition From Speech

WHEN THERE IS A mismatch between the acoustic

Generative models and adversarial training

Speech Emotion Recognition Using Support Vector Machine

Rule Learning with Negation: Issues Regarding Effectiveness

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Rule Learning With Negation: Issues Regarding Effectiveness

Calibration of Confidence Measures in Speech Recognition

Lecture 1: Basic Concepts of Machine Learning

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Softprop: Softmax Neural Network Backpropagation Learning

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Modeling function word errors in DNN-HMM based LVCSR systems

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Truth Inference in Crowdsourcing: Is the Problem Solved?

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Probability and Statistics Curriculum Pacing Guide

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

12- A whirlwind tour of statistics

arxiv: v2 [cs.cv] 30 Mar 2017

Extending Place Value with Whole Numbers to 1,000,000

Evaluation of Teach For America:

Ryerson University Sociology SOC 483: Advanced Research and Statistics

AP Statistics Summer Assignment 17-18

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Reducing Features to Improve Bug Prediction

Statewide Framework Document for:

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Artificial Neural Networks written examination

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

Why Did My Detector Do That?!

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

A study of speaker adaptation for DNN-based speech synthesis

Speech Recognition at ICSI: Broadcast News and beyond

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Lecture 10: Reinforcement Learning

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Learning From the Past with Experiment Databases

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Applications of data mining algorithms to analysis of medical data

Model Ensemble for Click Prediction in Bing Search Ads

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Learning Methods in Multilingual Speech Recognition

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

Chapter 2 Rule Learning in a Nutshell

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Time series prediction

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Disambiguation of Thai Personal Name from Online News Articles

Benjamin Pohl, Yves Richard, Manon Kohler, Justin Emery, Thierry Castel, Benjamin De Lapparent, Denis Thévenin, Thomas Thévenin, Julien Pergaud

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Physics 270: Experimental Physics

MGT/MGP/MGB 261: Investment Analysis

INPE São José dos Campos

Axiom 2013 Team Description Paper

Visit us at:

Multi-Lingual Text Leveling

Mandarin Lexical Tone Recognition: The Gating Paradigm

STA 225: Introductory Statistics (CT)

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Grade 6: Correlated to AGS Basic Math Skills

Welcome to. ECML/PKDD 2004 Community meeting

An Online Handwriting Recognition System For Turkish

Probability and Game Theory Course Syllabus

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Evolution of Symbolisation in Chimpanzees and Neural Nets

Linking Task: Identifying authors and book titles in verbose queries

Modeling function word errors in DNN-HMM based LVCSR systems

Transcription:

Overview COEN 296 Topics in Computer Engineering to Pattern Recognition and Data Mining Instructor: Dr. Giovanni Seni G.Seni@ieee.org Department of Computer Engineering Santa Clara University Course Goals & Syllabus Pattern Recognition Features Classification Generalization System components Related Fields: ML & DM Design Cycle Computational Complexity The R Language G.Seni Q1/04 2 Course Goals Syllabus Convey excitement about an immensely useful field Large increase in digital data (barcode scanners, e-commerce, etc.) Moore s Law Provide foundation for further study/research Expose to real data Introduce you to toolbox of methods Jan 6 Jan 13 Jan 20 Jan 27 Feb 3 Feb 10 Feb 17 Feb 24 Mar 2 Mar 9 Bayesian Decision Theory (2.1-2.6, 2.9) Parameter Estimation (3.1-3.4; see also 4.5 HMS) Linear Discriminant Functions (3.8.2, 5.1-5.8) Neural Networks (6.1-6.5) Neural Networks (6.6, 6.8) Clustering (10.6, 10.7; see also 9.3-9.6 HMS) Clustering (10.9) Non-metric: Association Rules (5.3.2 HMS) Text Retrieval (14.1-14.3 HMS) G.Seni Q1/04 3 G.Seni Q1/04 4

Pattern Recognition The act of taking in raw data and taking an action based on the category of the pattern Sorting incoming Fish on a conveyor according to species using optical sensing Useful applications Speech recognition Word & Character Recognition OCR (Optical Character Recognition) Fingerprint identification ( biometrics ) DNA sequence identification ( bioinformatics ) Fraud detection etc. category-1: sea bass category-2: salmon G.Seni Q1/04 5 G.Seni Q1/04 6 Feature Extraction Representation in which patterns that lead to same action are close to one another, yet far" from those that demand a different action i.e., discriminative Data reduction Initial model: sea bass is generally longer and lighter than salmon Histograms on training samples Features to explore Length, Lightness, Width, Number and shape of fins, Position of the mouth, etc ID 1 2 3 Class length 7.8 19.1 5.6 lightness 3.1 7.9 4.2 G.Seni Q1/04 7 G.Seni Q1/04 8

Feature Space Classification Fish X = x1 = lightness x2 = width Separate feature space into regions corresponding to the classes The separating boundary is called the decision boundary Perfect classification is often impossible use probability framework Easy to incorporate priors and misclassification costs G.Seni Q1/04 9 G.Seni Q1/04 10 Generalization Ability to correctly classify novel input Tradeoff between decision model complexity and generalization performance Pattern Recognition System input sensing segmentation feature extraction decision Post-processing classification complex lower training error higher test error simpler higher training error lower test error Sensing converts physical inputs into signal data Bandwidth, resolution, sensitivity, distortion of transducer imposes limitations on system Segmentation - isolates objects from background or other objects Post-processing account for context and cost of errors G.Seni Q1/04 11 G.Seni Q1/04 12

Related Disciplines Data Mining produce insight and understanding about the structure of large observational datasets e.g., Find interesting relationships Summarize the data in new ways that are understandable and actionable Machine Learning how to construct computer programs that automatically improve with experience (Mitchell) Theory and algorithms Other Statistics, information theory, etc. Related Disciplines (2) Data Mining Algorithm Components Task: visualization, classification, clustering, regression, rule discovery Structure: functional form of the model we are fitting to the data (e.g., linear, hierarchical) Score function: goodness-of-fit function we are using to judge the quality of our fitted model on observed data Search/optimization method: computational procedure used to find the maximum (or minimum) of the score function for a particular model Data management technique: location and manner in which data is accessed G.Seni Q1/04 13 G.Seni Q1/04 14 Design Cycle Design Cycle (2) Representative set of examples for training and testing the system Can account for large part of the development cost Data matrix: n d ID 248 249 250 Age 54?? 29 Sex Male Female Male Marital Status Education Income Married High school 100000 Married High school 12000 Married Some college 23000 G.Seni Q1/04 15 Feature choice useful for discriminating Easy to extract Invariant to irrelevant transformations Insensitive to noise Type Quantitative measured on a numerical scale Categorical: nominal and ordinal (possessing a natural order) G.Seni Q1/04 16

Design Cycle (3) Design Cycle (4) Predictive Modeling the value of one variable is predicted from the known values of other variables (classification, regression) E.g., a nonlinear model Y = ax 2 + bx + c Descriptive Modeling clustering and segmentation, depency modeling, probability density estimation Training using training patterns to learn or estimate the parameters of the model (supervised or unsupervised) Score Function: quantifies how well model fits a given data set E.g., likelihood, sum of square errors, misclassification rate Optimization (or Search) Method: determine the parameter values that achieve a minimum (or maximum) of the score function E.g., gradient descent G.Seni Q1/04 17 G.Seni Q1/04 18 Design Cycle (5) Evaluation measure performance and adjust components appropriately Train vs. Test Error Overfitting Bias-variance tradeoff Dimensionality Classification accuracy deps upon the dimensionality and the amount of training data Theoretically, error rate can be reduced by introducing new, indepent features Need features that help separate the class pairs most frequently confused (e.g., distance between class means) G.Seni Q1/04 19 G.Seni Q1/04 20

Dimensionality (2) Practical paradox: beyond a certain point, the inclusion of additional features leads to worse performance Source of difficulty Wrong model E.g., Gaussian assumption Indepence assumption Inadequate number of training samples Distributions are not estimated accurately Computational Complexity Time/space considerations are of considerable practical importance at each stage A table lookup might result in error-free recognition but impractical Scalability as a function of: Number of features (d) Number of patterns (n) Cumber of classes (c) Learning vs. decision-making time G.Seni Q1/04 21 G.Seni Q1/04 22 The R Language An open source version of S a language and environment for data analysis http://www.r-project.org/ Library provides many datasets Sample commands: > x <- read.table( mydata.txt", header = TRUE) > dim(x) [1] 8192 18 > x[5, 7:9] P S K 5 11 4 12 > hist(x[,7], breaks=100, xlab="amount", main= P") The R Language (2) Other useful functions: Input/Output: read.table, read.delim, scan, write, write.table Extraction: which, apply Names: row.names, colnames, names Plots: hist, plot, points, lines, pdf, dev.off Error catching: stop, warning Sizes: dim, nrow, ncol, length Math: sum, mean, cor, log, max, min, range Casts: as.matrix, as.vector, as.numeric Type test: is.matrix, is.vector, is.numeric, is.data.frame Ordering: sort, order Help:?command G.Seni Q1/04 23 G.Seni Q1/04 24