Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Similar documents
Artificial Neural Networks

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Artificial Neural Networks written examination

Python Machine Learning

Word Segmentation of Off-line Handwritten Documents

Rule Learning With Negation: Issues Regarding Effectiveness

Human Emotion Recognition From Speech

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Speech Recognition at ICSI: Broadcast News and beyond

A Neural Network GUI Tested on Text-To-Phoneme Mapping

CSL465/603 - Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Lecture 1: Machine Learning Basics

Softprop: Softmax Neural Network Backpropagation Learning

INPE São José dos Campos

Knowledge Transfer in Deep Convolutional Neural Nets

Lecture 1: Basic Concepts of Machine Learning

Learning Methods for Fuzzy Systems

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

(Sub)Gradient Descent

Rule Learning with Negation: Issues Regarding Effectiveness

Speech Emotion Recognition Using Support Vector Machine

A Case Study: News Classification Based on Term Frequency

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

Seminar - Organic Computing

Time series prediction

A study of speaker adaptation for DNN-based speech synthesis

Evolutive Neural Net Fuzzy Filtering: Basic Description

Evolution of Symbolisation in Chimpanzees and Neural Nets

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

CS Machine Learning

Knowledge-Based - Systems

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

SARDNET: A Self-Organizing Feature Map for Sequences

Probabilistic Latent Semantic Analysis

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

The Good Judgment Project: A large scale test of different methods of combining expert predictions

WHEN THERE IS A mismatch between the acoustic

Issues in the Mining of Heart Failure Datasets

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Forget catastrophic forgetting: AI that learns after deployment

Abstractions and the Brain

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Mining Association Rules in Student s Assessment Data

Speaker Identification by Comparison of Smart Methods. Abstract

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

On the Formation of Phoneme Categories in DNN Acoustic Models

Semi-Supervised Face Detection

Calibration of Confidence Measures in Speech Recognition

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

On-Line Data Analytics

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

1 NETWORKS VERSUS SYMBOL SYSTEMS: TWO APPROACHES TO MODELING COGNITION

An Empirical and Computational Test of Linguistic Relativity

Predicting Early Students with High Risk to Drop Out of University using a Neural Network-Based Approach

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

MYCIN. The MYCIN Task

Breaking the Habit of Being Yourself Workshop for Quantum University

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

A Case-Based Approach To Imitation Learning in Robotic Agents

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Lecture 10: Reinforcement Learning

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Australian Journal of Basic and Applied Sciences

Data Fusion Through Statistical Matching

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems

Learning From the Past with Experiment Databases

Neuro-Symbolic Approaches for Knowledge Representation in Expert Systems

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

AQUA: An Ontology-Driven Question Answering System

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Using focal point learning to improve human machine tacit coordination

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Test Effort Estimation Using Neural Network

A student diagnosing and evaluation system for laboratory-based academic exercises

Classification Using ANN: A Review

Axiom 2013 Team Description Paper

Assignment 1: Predicting Amazon Review Ratings

Cooperative evolutive concept learning: an empirical study

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Transcription:

EE-589 Introduction to Neural Assistant Prof. Dr. Turgay IBRIKCI Room # 305 (322) 338 6868 / 139 Wensdays 9:00-12:00 Course Outline The course is divided in two parts: theory and practice. 1. Theory covers basic topics in neural networks theory and application to supervised and unsupervised learning. 2. Practice deals with basics of Matlab and implementation of NN learning algorithms. eembdersler.wordpress.com NN 1 1 NN 1 2 Course Grading Grading the Class: Project 40% Abstract 5%(Week 8-08/11/2011) Report & 35% (Week 13/14- CD or email-will be in paper publication format) Presentation (Week 14 ; 20 mins) Final Exam 20% (Week 15 Homeworks 40% (At least 4 homeworks) What you learn from the course How to approach a machine learning classification or clustering Basic knowledge of the common linear machine learning algorithms Basic knowledge of Neural learning algorithms A good understanding of neural network algorithms NN 1 3 NN 1 4 Academic Integrity Where to go for help All programming is to be done alone! Do not share code with anyone else in the course (looking at code counts as sharing)! Comparison of homeworks will be to catch cheaters! Minimum penalty is 2 letter grade drop for the course for everyone involved. You can discuss your programs with anyone! Feel free to send your code to turgayibrikci@gmail.com and ask me for help. Please write your name and coursenumber to the subject line Bring your code to office hours or to class NN 1 5 NN 1 6 EE 589 2007 1

What is a Pattern? A set of instances that share some regularities and similarities is repeatable is observable, sometimes partially, using sensors May have noise and distortions Texture patterns Image object Speech patterns Text document category patterns News video Biological signals Many others Examples of Patterns NN 1 7 NN 1 8 Male vs. Female Face Pattern From Yang, et al., PAMI, May 2002 What is Pattern Recognition? Pattern recognition (PR) is the scientific discipline that concerns the description and classification (recognition) of patterns (objects) PR techniques are an important component of intelligent systems and are used for many application domains Decision making Object and pattern classification NN 1 9 NN 1 10 Machine Perception Build a machine that can recognize patterns: Speech recognition Fingerprint identification OCR (Optical Character Recognition) DNA sequence identification NN 1 11 Based on Lecture Notes for E Alpaydın 2004 12 Introduction to Machine Learning The MIT Press (V1.1) EE 589 2007 2

An Example of Fish Classification Sort incoming Fish on a conveyor according to species using optical sensing Sea bass Species Salmon Problem Analysis Set up a camera and take some sample images to extract features Length Lightness Width Number and shape of fins Position of the mouth, etc This is the set of all suggested features to explore for use in our classifier! NN 1 13 NN 1 14 Preprocessing Use a segmentation operation to isolate fishes from one another and from the background To extract one fish for the next step Feature extraction Measuring certain features of the fish to be classified Is one of the most critical steps in the pattern recognition system design Classification Select the length of the fish as a possible feature for discrimination NN 1 15 NN 1 16 Threshold decision boundary and cost relationship Move our decision boundary toward smaller values of lightness in order to minimize the cost (reduce the number of sea bass that are classified salmon!) Task of decision theory NN 1 17 NN 1 18 EE 589 2007 3

Adopt the lightness and add the width of the fish Fish x T = [x 1, x 2 ] Lightness Width We might add other features that are not correlated with the ones we already have. A precaution should be taken not to reduce the performance by adding such noisy features Ideally, the best decision boundary should be the one which provides an optimal performance such as in the following figure: NN 1 19 NN 1 20 Recognition Systems Sensing Use of a transducer (camera or microphone) PR system depends on the bandwidth, the resolution sensitivity distortion of the transducer, etc. Segmentation and grouping Patterns should be well separated and should not overlap NN 1 21 NN 1 22 Feature extraction Discriminative features Invariant features with respect to translation, rotation and scale. Classification Use a feature vector provided by a feature extractor to assign the object to a category Post Processing Exploit context dependent information other than from the target pattern itself to improve performance The Design Cycle Data collection Feature Choice Model Choice Training Evaluation Computational Complexity NN 1 23 NN 1 24 EE 589 2007 4

Data Collection How do we know when we have collected an adequately large and representative set of examples for training and testing the system? Feature Choice Depends on the characteristics of the problem domain. Simple to extract, invariant to irrelevant transformation, insensitive to noise. NN 1 25 NN 1 26 Model Choice Unsatisfied with the performance of our fish classifier and want to jump to another class of model Training Use data to determine the classifier. Many different procedures for training classifiers and choosing models Evaluation Measure the error rate for: Different feature sets Different training methods Different training and test data sets Computational Complexity What is the trade-off between computational ease and performance? (How an algorithm scales as a function of the number of features, patterns or categories?) NN 1 27 NN 1 28 What are Neural? Introduction Simple computational elements forming a large network Emphasis on learning (pattern recognition) Local computation (neurons) Definition of NNs is vague Often but not always inspired by biological brain NN 1 29 What is an (artificial) neural network A set of nodes (units, neurons, processing elements) Each node has input and output Each node performs a simple computation by its node function Weighted connections between nodes Connectivity gives the structure/architecture of the net What can be computed by a NN is primarily determined by the connections and their weights A very much simplified version of networks of neurons in animal nerve systems NN 1 30 EE 589 2007 5

ANN Introduction Bio NN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nodes input output node function Connections connection strength -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Cell body signal from other neurons firing frequency firing mechanism Synapses synaptic strength Highly parallel, simple local computation (at neuron level) achieves global results as emerging property of the interaction (at network level) Pattern directed (meaning of individual nodes only in the context of a pattern) Fault-tolerant/graceful degrading Learning/adaptation plays important role. NN 1 31 Roots of work on NN are in: History Neurobiological studies (more than one century ago): How do nerves behave when stimulated by different magnitudes of electric current? Is there a minimal threshold needed for nerves to be activated? Given that no single nerve cel is long enough, how do different nerve cells communicate among each other? Psychological studies: How do animals learn, forget, recognize and perform other types of tasks? Psycho-physical experiments helped to understand how individual neurons and groups of neurons work. McCulloch and Pitts introduced the first mathematical model of single neuron, widely applied in subsequent work. NN 1 32 Prehistory: History Golgi and Ramon y Cajal study the nervous system and discover neurons (end of 19th century) History (brief): McCulloch and Pitts (1943): the first artificial neural network with binary neurons Hebb (1949): learning = neurons that are together wire together Minsky (1954): neural networks for reinforcement learning Taylor (1956): associative memory Rosenblatt (1958): perceptron, a single neuron for supervised learning History Widrow and Hoff (1960): Adaline Minsky and Papert (1969): limitations of single-layer perceptrons (and they erroneously claimed that the limitations hold for multi-layer perceptrons) Stagnation in the 70's: Individual researchers continue laying foundations von der Marlsburg (1973): competitive learning and self-organization Big neural-nets boom in the 80's Grossberg: adaptive resonance theory (ART) Hopfield: Hopfield network Kohonen: self-organising map (SOM) NN 1 33 NN 1 34 History Course Topics Learning Tasks Oja: neural principal component analysis (PCA) Ackley, Hinton and Sejnowski: Boltzmann machine Rumelhart, Hinton and Williams: backpropagation Diversification during the 90's and 2000 s : Machine learning: mathematical rigor, Bayesian methods, infomation theory, support vector machines,... Computational neurosciences: workings of most subsystems of the brain are understood at some level; research ranges from low-level compartmental models of individual neurons to large-scale brain models Supervised Data: Labeled examples (input, desired output) Tasks: Classification pattern recognition regression NN models: perceptron adaline feed-forward NN radial basis function support vector machines Unsupervised Data: Unlabeled examples (different realizations of the input) Tasks: Clustering content addressable memory NN models: self-organizing maps (SOM) Hopfield networks NN 1 35 NN 1 36 EE 589 2007 6

What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time. Herbert Simon Learning is any process by which a system improves performance from experience. Herbert Simon Learning is constructing or modifying representations of what is being experienced. Ryszard Michalski Learning is making useful changes in our minds. Marvin Minsky Two types of learning Supervised: The machine has access to a teacher who corrects it. Unsupervised: No access to teacher. Instead, the machine must search for order in the environment. NN 1 37 NN 1 38 Machine Learning - Example The mind-reading game [written by Y. Freund and R. Schapire] Repeat 200 times: Computer guesses whether you ll type 0/1 You type 0 or 1 The computer is right much more than half the time How? Machine Learning - Example One of my favorite Machine Learning sites: http://www.20q.net/ NN 1 39 NN 1 40 Why learn? Fill in skeletal or incomplete specifications about a domain Large, complex NN systems cannot be completely derived by hand and require dynamic updating to incorporate new information. Learning new characteristics expands the domain or expertise and lessens the brittleness of the system Discover new things or structure that were previously unknown to humans Examples: data mining, scientific discovery Understand and improve efficiency of human learning Why Study Machine Learning? Cognitive Science Computational studies of learning may help us understand learning in humans and other biological organisms. Hebbian neural learning Neurons that fire together, wire together. NN 1 41 NN 1 42 EE 589 2007 7

Related Disciplines Artificial Intelligence Pattern Recognition Data Mining Probability and Statistics Information theory Psychology (developmental, cognitive) Neurobiology Linguistics Philosophy NNs: goal and design Knowledge about the learning task is given in the form of examples called training examples. A NN is specified by: an architecture: a set of neurons and links connecting neurons. Each link has a weight, a neuron model: the information processing unit of the NN, a learning algorithm: used for training the NN by modifying the weights in order to solve the particular learning task correctly on the training examples. The aim is to obtain a NN that generalizes well, that is, that behaves correctly on new instances of the learning task. NN 1 43 NN 1 44 Dimensions of a Neural Network Network architectures network architectures types of neurons learning algorithms applications Three different classes of network architectures single-layer feed-forward neurons are organized multi-layer feed-forward in acyclic layers recurrent The architecture of a neural network is linked with the learning algorithm used to train NN 1 45 NN 1 46 Single Layer Feed-forward Multi layer feed-forward 3-4-2 Network Input layer of source nodes Output layer of neurons Input layer Output layer Hidden Layer NN 1 47 NN 1 48 EE 589 2007 8

Recurrent network The Neuron Recurrent Network with hidden neuron: unit delay operator z -1 is used to model a dynamic system z -1 z -1 input hidden output Input values x 1 x 2 M w 1 w 2 M Bias b Summing function Local Field v Activation function ϕ( ) Output y z -1 x m w m weights NN 1 49 NN 1 50 The Neuron Bias of a Neuron The neuron is the basic information processing unit of a NN. It consists of: 1 A set of links, describing the neuron inputs, with weights W 1, W 2,, W m 2 An adder function (linear combiner) for computing the weighted sum of m the inputs (real numbers): u= w jx j j= 1 The bias b has the effect of applying an affine transformation to the weighted sum u v = u + b v is called induced field of the neuron x2 x1-x2= -1 x1-x2=0 x1-x2= 1 u = x1 x2 3 Activation function (squashing function) for limiting the amplitude of the neuron output. ϕ y =ϕ(u+ b) x1 NN 1 51 NN 1 52 Input signal x 0 = +1 x 1 x 2 x m M Bias as extra input The bias is an external parameter of the neuron. It can be modeled by adding an extra input. m w 0 w 1 w 2 M w m Synaptic weights v w Local Field v Summing function = 0 j= 0 = b wx j Activation function ϕ( ) NN 1 53 j Output y Neuron Models ϕ The choice of determines the neuron model. Examples: step function: a ifv< c ϕ( v) = b ifv> c ramp function: a ifv< c ϕ( v) = b ifv> d a+ (( v c)( b a) /( d c)) otherwise sigmoid function: with z,x,y parameters 1 ϕ( v) = z+ 1+ exp( xv+ y) Gaussian function: 2 1 1 v µ ϕ( v) = exp 2πσ 2 σ NN 1 54 EE 589 2007 9

Where are NN used? Recognizing and matching complicated, vague, or incomplete patterns Data is unreliable Problems with noisy data Prediction Classification Data association Data conceptualization Filtering Planning NN 1 55 Applications Prediction: learning from past experience pick the best stocks in the market predict weather identify people with cancer risk Classification Image processing Predict bankruptcy for credit card companies Risk assessment NN 1 56 Applications Recognition Pattern recognition: SNOOPE (bomb detector in U.S. airports) Character recognition Handwriting: processing checks Data association Not only identify the characters that were scanned but identify when the scanner is not working properly Applications Data Conceptualization infer grouping relationships e.g. extract from a database the names of those most likely to buy a particular product. Data Filtering e.g. take the noise out of a telephone signal, signal smoothing Planning Unknown environments Sensor data is noisy Fairly new approach to planning NN 1 57 NN 1 58 Strengths of a Neural Network Power: Model complex functions, nonlinearity built into the network Ease of use: Learn by example Very little user domain-specific expertise needed Intuitively appealing: based on model of biology, will it lead to genuinely intelligent computers/robots? Neural networks cannot do anything that cannot be done using traditional computing techniques, BUT they can do some things which would otherwise be very difficult. General Advantages Advantages Adapt to unknown situations Robustness: fault tolerance due to network redundancy Autonomous learning and generalization Disadvantages Not exact Large complexity of the network structure For motion planning? NN 1 59 NN 1 60 EE 589 2007 10

Resources: Datasets References UCI Repository: http://www.ics.uci.edu/~mlearn/mlrepository.html UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.html Statlib: http://lib.stat.cmu.edu/ Delve: http://www.cs.utoronto.ca/~delve/ Based on Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1) 62 Based on Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1) Resources: Journals Journal of Machine Learning Research www.jmlr.org Machine Learning Neural Computation Neural IEEE Transactions on Neural IEEE Transactions on Pattern Analysis and Machine Intelligence Annals of Statistics Journal of the American Statistical Association... 63 Resources: Conferences International Conference on Machine Learning (ICML) ICML05: http://icml.ais.fraunhofer.de/ European Conference on Machine Learning (ECML) ECML05: http://ecmlpkdd05.liacc.up.pt/ Neural Information Processing Systems (NIPS) NIPS05: http://nips.cc/ Uncertainty in Artificial Intelligence (UAI) UAI05: http://www.cs.toronto.edu/uai2005/ Computational Learning Theory (COLT) COLT05: http://learningtheory.org/colt2005/ International Joint Conference on Artificial Intelligence (IJCAI) IJCAI05: http://ijcai05.csd.abdn.ac.uk/ International Conference on Neural (Europe) ICANN05: http://www.ibspan.waw.pl/icann-2005/... Based on Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1) 64 PrTools Pattern Recognition Toolbox from Delft University of Technology PrTools 4.0 Terminology Training example. An example of the form (x; y). x is usually a vector of features, y is called the class label". We will index the features by j, hence x j is the j-th feature of x. The number of features is n. Target function. The true function f, the true conditional distribution P(y x), or the true joint distribution P(x; y) Hypothesis. A proposed function or distribution h believed to be similar to f or P. Concept. A boolean function. Examples for which f(x) = 1 are called positive examples or positive instances of the concept. Examples for which f(x) = 0 are called negative examples or negative instances. Classifier. A discrete-valued function. The possible values f(x) {1,, k} are called the classes or class labels. Hypothesis Space. The space of all hypotheses that can, in principle, be output by a particular learning algorithm. Based on Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1) 65 Version Space. The space of all hypotheses in the hypothesis space that have not yet been ruled out by a training example. Training Sample (or Training Set or Training Data) a set of N training examples drawn according to P(x; y). Test Set. A set of training examples used to evaluate a proposed hypothesis h. Validation Set. A set of training examples (typically a subset of the training set) used to guide the learning algorithm and prevent overtting. Based on Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1) 66 EE 589 2007 11