EE-589 Introduction to Neural Assistant Prof. Dr. Turgay IBRIKCI Room # 305 (322) 338 6868 / 139 Wensdays 9:00-12:00 Course Outline The course is divided in two parts: theory and practice. 1. Theory covers basic topics in neural networks theory and application to supervised and unsupervised learning. 2. Practice deals with basics of Matlab and implementation of NN learning algorithms. eembdersler.wordpress.com NN 1 1 NN 1 2 Course Grading Grading the Class: Project 40% Abstract 5%(Week 8-08/11/2011) Report & 35% (Week 13/14- CD or email-will be in paper publication format) Presentation (Week 14 ; 20 mins) Final Exam 20% (Week 15 Homeworks 40% (At least 4 homeworks) What you learn from the course How to approach a machine learning classification or clustering Basic knowledge of the common linear machine learning algorithms Basic knowledge of Neural learning algorithms A good understanding of neural network algorithms NN 1 3 NN 1 4 Academic Integrity Where to go for help All programming is to be done alone! Do not share code with anyone else in the course (looking at code counts as sharing)! Comparison of homeworks will be to catch cheaters! Minimum penalty is 2 letter grade drop for the course for everyone involved. You can discuss your programs with anyone! Feel free to send your code to turgayibrikci@gmail.com and ask me for help. Please write your name and coursenumber to the subject line Bring your code to office hours or to class NN 1 5 NN 1 6 EE 589 2007 1
What is a Pattern? A set of instances that share some regularities and similarities is repeatable is observable, sometimes partially, using sensors May have noise and distortions Texture patterns Image object Speech patterns Text document category patterns News video Biological signals Many others Examples of Patterns NN 1 7 NN 1 8 Male vs. Female Face Pattern From Yang, et al., PAMI, May 2002 What is Pattern Recognition? Pattern recognition (PR) is the scientific discipline that concerns the description and classification (recognition) of patterns (objects) PR techniques are an important component of intelligent systems and are used for many application domains Decision making Object and pattern classification NN 1 9 NN 1 10 Machine Perception Build a machine that can recognize patterns: Speech recognition Fingerprint identification OCR (Optical Character Recognition) DNA sequence identification NN 1 11 Based on Lecture Notes for E Alpaydın 2004 12 Introduction to Machine Learning The MIT Press (V1.1) EE 589 2007 2
An Example of Fish Classification Sort incoming Fish on a conveyor according to species using optical sensing Sea bass Species Salmon Problem Analysis Set up a camera and take some sample images to extract features Length Lightness Width Number and shape of fins Position of the mouth, etc This is the set of all suggested features to explore for use in our classifier! NN 1 13 NN 1 14 Preprocessing Use a segmentation operation to isolate fishes from one another and from the background To extract one fish for the next step Feature extraction Measuring certain features of the fish to be classified Is one of the most critical steps in the pattern recognition system design Classification Select the length of the fish as a possible feature for discrimination NN 1 15 NN 1 16 Threshold decision boundary and cost relationship Move our decision boundary toward smaller values of lightness in order to minimize the cost (reduce the number of sea bass that are classified salmon!) Task of decision theory NN 1 17 NN 1 18 EE 589 2007 3
Adopt the lightness and add the width of the fish Fish x T = [x 1, x 2 ] Lightness Width We might add other features that are not correlated with the ones we already have. A precaution should be taken not to reduce the performance by adding such noisy features Ideally, the best decision boundary should be the one which provides an optimal performance such as in the following figure: NN 1 19 NN 1 20 Recognition Systems Sensing Use of a transducer (camera or microphone) PR system depends on the bandwidth, the resolution sensitivity distortion of the transducer, etc. Segmentation and grouping Patterns should be well separated and should not overlap NN 1 21 NN 1 22 Feature extraction Discriminative features Invariant features with respect to translation, rotation and scale. Classification Use a feature vector provided by a feature extractor to assign the object to a category Post Processing Exploit context dependent information other than from the target pattern itself to improve performance The Design Cycle Data collection Feature Choice Model Choice Training Evaluation Computational Complexity NN 1 23 NN 1 24 EE 589 2007 4
Data Collection How do we know when we have collected an adequately large and representative set of examples for training and testing the system? Feature Choice Depends on the characteristics of the problem domain. Simple to extract, invariant to irrelevant transformation, insensitive to noise. NN 1 25 NN 1 26 Model Choice Unsatisfied with the performance of our fish classifier and want to jump to another class of model Training Use data to determine the classifier. Many different procedures for training classifiers and choosing models Evaluation Measure the error rate for: Different feature sets Different training methods Different training and test data sets Computational Complexity What is the trade-off between computational ease and performance? (How an algorithm scales as a function of the number of features, patterns or categories?) NN 1 27 NN 1 28 What are Neural? Introduction Simple computational elements forming a large network Emphasis on learning (pattern recognition) Local computation (neurons) Definition of NNs is vague Often but not always inspired by biological brain NN 1 29 What is an (artificial) neural network A set of nodes (units, neurons, processing elements) Each node has input and output Each node performs a simple computation by its node function Weighted connections between nodes Connectivity gives the structure/architecture of the net What can be computed by a NN is primarily determined by the connections and their weights A very much simplified version of networks of neurons in animal nerve systems NN 1 30 EE 589 2007 5
ANN Introduction Bio NN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nodes input output node function Connections connection strength -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Cell body signal from other neurons firing frequency firing mechanism Synapses synaptic strength Highly parallel, simple local computation (at neuron level) achieves global results as emerging property of the interaction (at network level) Pattern directed (meaning of individual nodes only in the context of a pattern) Fault-tolerant/graceful degrading Learning/adaptation plays important role. NN 1 31 Roots of work on NN are in: History Neurobiological studies (more than one century ago): How do nerves behave when stimulated by different magnitudes of electric current? Is there a minimal threshold needed for nerves to be activated? Given that no single nerve cel is long enough, how do different nerve cells communicate among each other? Psychological studies: How do animals learn, forget, recognize and perform other types of tasks? Psycho-physical experiments helped to understand how individual neurons and groups of neurons work. McCulloch and Pitts introduced the first mathematical model of single neuron, widely applied in subsequent work. NN 1 32 Prehistory: History Golgi and Ramon y Cajal study the nervous system and discover neurons (end of 19th century) History (brief): McCulloch and Pitts (1943): the first artificial neural network with binary neurons Hebb (1949): learning = neurons that are together wire together Minsky (1954): neural networks for reinforcement learning Taylor (1956): associative memory Rosenblatt (1958): perceptron, a single neuron for supervised learning History Widrow and Hoff (1960): Adaline Minsky and Papert (1969): limitations of single-layer perceptrons (and they erroneously claimed that the limitations hold for multi-layer perceptrons) Stagnation in the 70's: Individual researchers continue laying foundations von der Marlsburg (1973): competitive learning and self-organization Big neural-nets boom in the 80's Grossberg: adaptive resonance theory (ART) Hopfield: Hopfield network Kohonen: self-organising map (SOM) NN 1 33 NN 1 34 History Course Topics Learning Tasks Oja: neural principal component analysis (PCA) Ackley, Hinton and Sejnowski: Boltzmann machine Rumelhart, Hinton and Williams: backpropagation Diversification during the 90's and 2000 s : Machine learning: mathematical rigor, Bayesian methods, infomation theory, support vector machines,... Computational neurosciences: workings of most subsystems of the brain are understood at some level; research ranges from low-level compartmental models of individual neurons to large-scale brain models Supervised Data: Labeled examples (input, desired output) Tasks: Classification pattern recognition regression NN models: perceptron adaline feed-forward NN radial basis function support vector machines Unsupervised Data: Unlabeled examples (different realizations of the input) Tasks: Clustering content addressable memory NN models: self-organizing maps (SOM) Hopfield networks NN 1 35 NN 1 36 EE 589 2007 6
What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time. Herbert Simon Learning is any process by which a system improves performance from experience. Herbert Simon Learning is constructing or modifying representations of what is being experienced. Ryszard Michalski Learning is making useful changes in our minds. Marvin Minsky Two types of learning Supervised: The machine has access to a teacher who corrects it. Unsupervised: No access to teacher. Instead, the machine must search for order in the environment. NN 1 37 NN 1 38 Machine Learning - Example The mind-reading game [written by Y. Freund and R. Schapire] Repeat 200 times: Computer guesses whether you ll type 0/1 You type 0 or 1 The computer is right much more than half the time How? Machine Learning - Example One of my favorite Machine Learning sites: http://www.20q.net/ NN 1 39 NN 1 40 Why learn? Fill in skeletal or incomplete specifications about a domain Large, complex NN systems cannot be completely derived by hand and require dynamic updating to incorporate new information. Learning new characteristics expands the domain or expertise and lessens the brittleness of the system Discover new things or structure that were previously unknown to humans Examples: data mining, scientific discovery Understand and improve efficiency of human learning Why Study Machine Learning? Cognitive Science Computational studies of learning may help us understand learning in humans and other biological organisms. Hebbian neural learning Neurons that fire together, wire together. NN 1 41 NN 1 42 EE 589 2007 7
Related Disciplines Artificial Intelligence Pattern Recognition Data Mining Probability and Statistics Information theory Psychology (developmental, cognitive) Neurobiology Linguistics Philosophy NNs: goal and design Knowledge about the learning task is given in the form of examples called training examples. A NN is specified by: an architecture: a set of neurons and links connecting neurons. Each link has a weight, a neuron model: the information processing unit of the NN, a learning algorithm: used for training the NN by modifying the weights in order to solve the particular learning task correctly on the training examples. The aim is to obtain a NN that generalizes well, that is, that behaves correctly on new instances of the learning task. NN 1 43 NN 1 44 Dimensions of a Neural Network Network architectures network architectures types of neurons learning algorithms applications Three different classes of network architectures single-layer feed-forward neurons are organized multi-layer feed-forward in acyclic layers recurrent The architecture of a neural network is linked with the learning algorithm used to train NN 1 45 NN 1 46 Single Layer Feed-forward Multi layer feed-forward 3-4-2 Network Input layer of source nodes Output layer of neurons Input layer Output layer Hidden Layer NN 1 47 NN 1 48 EE 589 2007 8
Recurrent network The Neuron Recurrent Network with hidden neuron: unit delay operator z -1 is used to model a dynamic system z -1 z -1 input hidden output Input values x 1 x 2 M w 1 w 2 M Bias b Summing function Local Field v Activation function ϕ( ) Output y z -1 x m w m weights NN 1 49 NN 1 50 The Neuron Bias of a Neuron The neuron is the basic information processing unit of a NN. It consists of: 1 A set of links, describing the neuron inputs, with weights W 1, W 2,, W m 2 An adder function (linear combiner) for computing the weighted sum of m the inputs (real numbers): u= w jx j j= 1 The bias b has the effect of applying an affine transformation to the weighted sum u v = u + b v is called induced field of the neuron x2 x1-x2= -1 x1-x2=0 x1-x2= 1 u = x1 x2 3 Activation function (squashing function) for limiting the amplitude of the neuron output. ϕ y =ϕ(u+ b) x1 NN 1 51 NN 1 52 Input signal x 0 = +1 x 1 x 2 x m M Bias as extra input The bias is an external parameter of the neuron. It can be modeled by adding an extra input. m w 0 w 1 w 2 M w m Synaptic weights v w Local Field v Summing function = 0 j= 0 = b wx j Activation function ϕ( ) NN 1 53 j Output y Neuron Models ϕ The choice of determines the neuron model. Examples: step function: a ifv< c ϕ( v) = b ifv> c ramp function: a ifv< c ϕ( v) = b ifv> d a+ (( v c)( b a) /( d c)) otherwise sigmoid function: with z,x,y parameters 1 ϕ( v) = z+ 1+ exp( xv+ y) Gaussian function: 2 1 1 v µ ϕ( v) = exp 2πσ 2 σ NN 1 54 EE 589 2007 9
Where are NN used? Recognizing and matching complicated, vague, or incomplete patterns Data is unreliable Problems with noisy data Prediction Classification Data association Data conceptualization Filtering Planning NN 1 55 Applications Prediction: learning from past experience pick the best stocks in the market predict weather identify people with cancer risk Classification Image processing Predict bankruptcy for credit card companies Risk assessment NN 1 56 Applications Recognition Pattern recognition: SNOOPE (bomb detector in U.S. airports) Character recognition Handwriting: processing checks Data association Not only identify the characters that were scanned but identify when the scanner is not working properly Applications Data Conceptualization infer grouping relationships e.g. extract from a database the names of those most likely to buy a particular product. Data Filtering e.g. take the noise out of a telephone signal, signal smoothing Planning Unknown environments Sensor data is noisy Fairly new approach to planning NN 1 57 NN 1 58 Strengths of a Neural Network Power: Model complex functions, nonlinearity built into the network Ease of use: Learn by example Very little user domain-specific expertise needed Intuitively appealing: based on model of biology, will it lead to genuinely intelligent computers/robots? Neural networks cannot do anything that cannot be done using traditional computing techniques, BUT they can do some things which would otherwise be very difficult. General Advantages Advantages Adapt to unknown situations Robustness: fault tolerance due to network redundancy Autonomous learning and generalization Disadvantages Not exact Large complexity of the network structure For motion planning? NN 1 59 NN 1 60 EE 589 2007 10
Resources: Datasets References UCI Repository: http://www.ics.uci.edu/~mlearn/mlrepository.html UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.html Statlib: http://lib.stat.cmu.edu/ Delve: http://www.cs.utoronto.ca/~delve/ Based on Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1) 62 Based on Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1) Resources: Journals Journal of Machine Learning Research www.jmlr.org Machine Learning Neural Computation Neural IEEE Transactions on Neural IEEE Transactions on Pattern Analysis and Machine Intelligence Annals of Statistics Journal of the American Statistical Association... 63 Resources: Conferences International Conference on Machine Learning (ICML) ICML05: http://icml.ais.fraunhofer.de/ European Conference on Machine Learning (ECML) ECML05: http://ecmlpkdd05.liacc.up.pt/ Neural Information Processing Systems (NIPS) NIPS05: http://nips.cc/ Uncertainty in Artificial Intelligence (UAI) UAI05: http://www.cs.toronto.edu/uai2005/ Computational Learning Theory (COLT) COLT05: http://learningtheory.org/colt2005/ International Joint Conference on Artificial Intelligence (IJCAI) IJCAI05: http://ijcai05.csd.abdn.ac.uk/ International Conference on Neural (Europe) ICANN05: http://www.ibspan.waw.pl/icann-2005/... Based on Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1) 64 PrTools Pattern Recognition Toolbox from Delft University of Technology PrTools 4.0 Terminology Training example. An example of the form (x; y). x is usually a vector of features, y is called the class label". We will index the features by j, hence x j is the j-th feature of x. The number of features is n. Target function. The true function f, the true conditional distribution P(y x), or the true joint distribution P(x; y) Hypothesis. A proposed function or distribution h believed to be similar to f or P. Concept. A boolean function. Examples for which f(x) = 1 are called positive examples or positive instances of the concept. Examples for which f(x) = 0 are called negative examples or negative instances. Classifier. A discrete-valued function. The possible values f(x) {1,, k} are called the classes or class labels. Hypothesis Space. The space of all hypotheses that can, in principle, be output by a particular learning algorithm. Based on Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1) 65 Version Space. The space of all hypotheses in the hypothesis space that have not yet been ruled out by a training example. Training Sample (or Training Set or Training Data) a set of N training examples drawn according to P(x; y). Test Set. A set of training examples used to evaluate a proposed hypothesis h. Validation Set. A set of training examples (typically a subset of the training set) used to guide the learning algorithm and prevent overtting. Based on Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1) 66 EE 589 2007 11